COURSE 5: REGRESSION ANALYSIS: SIMPLIFY COMPLEX DATA RELATIONSHIPS

Module 4: Advanced Hypothesis Testing

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Advanced Hypothesis Testing

In this section, participants will extend their expertise in hypothesis testing by delving into two additional statistical tests: Chi-squared and analysis of variance (ANOVA). These tests serve as essential tools in a data professional’s toolkit, enabling the analysis of diverse data types and facilitating nuanced insights. Participants will deepen their understanding of the application of these tests to various scenarios, broadening their statistical repertoire.

The exploration of two kinds of Chi-squared tests and one-way and two-way ANOVA tests adds a practical dimension to the theoretical foundation laid in earlier sections. By actively engaging in these tests, participants will not only strengthen their statistical acumen but also develop the capacity to select and apply the most appropriate test for different data analysis situations. This comprehensive overview contributes to participants’ ability to make informed decisions based on statistical evidence, enhancing their proficiency as data professionals.

Learning Objectives

  • Distinguish between ANOVA, ANCOVA, MANOVA, and MANCOVA
  • Define ANCOVA, MANOVA, and MANCOVA
  • Run post hoc tests with ANOVA
  • Perform a two-way ANOVA test
  • Perform a one-way ANOVA test
  • Describe when to use analysis of variance (ANOVA) testing
  • Perform a x2 test of independence
  • Perform a x2 (“chi-squared”) goodness of fit test

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: THE CHI-SQUARED TEST

1. The chi-squared goodness of fit test determines whether an observed categorical variable follows an expected distribution.

  • True (CORRECT)
  • False

Correct: The chi-squared goodness of fit test determines whether an observed categorical variable follows an expected distribution. The test’s null hypothesis states that the variable follows the expected distribution. The alternative hypothesis states that the variable doesn’t follow the expected distribution.

2. Which test determines whether two categorical variables are associated with each other?

  • Chi-squared test for independence (CORRECT)
  • Chi-squared alternative of fit test
  • Chi-squared goodness of fit test
  • Chi-squared test for dependence

Correct: The chi-squared test for independence determines whether two categorical variables are associated with each other. The test’s null hypothesis is that the variables are independent. The alternative hypothesis states that the variables are not independent and are therefore associated with each other.

3. Fill in the blank: The chi-squared statistic equals the sum of the observed number minus the expected number, squared, divided by the _____ number.

  • Observed
  • Hypothesis
  • Expected (CORRECT)
  • predicted

Correct: The chi-squared statistic equals the sum of the observed number minus the expected number, squared, divided by the expected number.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ANALYSIS OF VARIANCE

1. Which of the following statements accurately describe t-tests and analyses of variance? Select all that apply.

  • A t-test can test means between several groups.
  • An analysis of variance test can only test the difference of mean between two groups.
  • An analysis of variance test can test means between several groups. (CORRECT)
  • A t-test can only test the difference of mean between two groups. (CORRECT)

Correct: A t-test can only test the difference of mean between two groups. An analysis of variance test can test means between several groups.

2. Which of the following are analysis of variance (ANOVA) tests? Select all that apply.

  • Half-way ANOVA
  • Five-way ANOVA
  • Two-way ANOVA (CORRECT)
  • One-way ANOVA (CORRECT)

Correct: One-way ANOVA and two-way ANOVA are types of analysis of variance tests. Analysis of variance, commonly called ANOVA, is a group of statistical techniques that test the difference of means between three or more groups.

3. Fill in the blank: A post hoc test performs a pairwise comparison between all available groups while controlling for the _____.

  • Tukey’s HSD
  • variable selection
  • error rate (CORRECT)
  • confidence interval

Correct: A post hoc test performs a pairwise comparison between all available groups while controlling for the error rate. There is always a small chance that the null hypothesis is falsely rejected purely based on probability. The post hoc ANOVA test controls for that increasing probability.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ANCOVA, MANOVA, and MANCOVA

1. Which statistical technique better isolates the relationship between a single categorical variable of interest and the Y variable?

  • One-way ANOVA
  • Analysis of covariance (ANCOVA) (CORRECT)
  • Multivariate analysis of variance (MANOVA)
  • Multivariate analysis of covariance (MANCOVA)

Correct: Analysis of covariance (ANCOVA) better isolates the relationship between a single categorical variable of interest and the Y variable. By taking the covariate into account, the ANCOVA technique allows data professionals to draw more accurate conclusions about the relationships among variables.

2. Which of the following statements accurately describe ANCOVA and linear regression? Select all that apply.

  • Linear regression focuses on a continuous Y variable (CORRECT)
  • ANCOVA includes covariates to gain a more clear understanding of the categorical variable. (CORRECT)
  • ANCOVA allows for continuous and categorical independent variables (CORRECT)
  • Linear regression helps predict the Y variable for unrecognized data. (CORRECT)

Correct: ANCOVA includes covariates to gain a more clear understanding of the categorical variable. Linear regression helps predict the Y variable for unrecognized data.

3. What is the key difference between MANCOVA and MANOVA?

  • MANCOVA includes a null hypothesis.
  • MANOVA has two or more continuous variables.
  • MANCOVA controls for covariates. (CORRECT)
  • MANOVA includes a categorical variable.

Correct: The key difference between MANCOVA and MANOVA is that MANCOVA controls for covariates. If a data professional is only interested in one categorical variable and they want to control for another variable, they can use MANCOVA.

QUIZ: MODULE 4 CHALLENGE

1. Fill in the blank: The _____ determines whether an observed categorical variable follows an expected distribution.

  •  f-test
  • bias-variance test
  • chi-squared test for independence
  • chi-squared goodness of fit test (CORRECT)

2. What examines the relationship between categorical variables and continuous variables?

  • Explanatory variance
  • Analysis of variance (CORRECT)
  • Adjusted R-squared
  • Loss function   

3. A data analytics team at a technical support provider works to identify the expected outcome of a customer policy update. They compare the means of one continuous dependent variable based on three groups of two categorical variables. What type of test does this scenario describe?

  • One-way analysis of variance
  • Two-way analysis of variance (CORRECT)
  • Post hoc test
  • T-test

4. The post hoc test performs a pairwise comparison between all available groups while controlling for what?

  • mean
  • bias
  • error rate (CORRECT)
  • median

5. A data professional needs to answer a question about company financials. They study the relationship between categorical and continuous variables to control for the effect of variables that are unrelated to the financial question. What type of statistical technique do they use?

  • Analysis of independence
  • Analysis of covariance (CORRECT)
  • Analysis of variance
  • Analysis of regression

6. Fill in the blank: The acronym MANOVA means _____ analysis of variance.

  • Mean
  • model
  • multiple
  • multivariate (CORRECT)

7. A data analyst wants to evaluate the effectiveness of different exercise programs on memory and fitness levels in elderly test subjects, controlling for age. She has two continuous dependent variables: memory score and fitness score. Her independent variable is the exercise program, which can be yoga, tai chi, or swimming. What type of test should she use?

  • MANCOVA (CORRECT)
  • MANOVA
  • ANOVA
  • ANCOVA

8. What is the group of statistical techniques that test the difference of means between three or more groups?

  • Analysis of variance (CORRECT)
  • Interactions of variance
  • Linearity of variance
  • Variance of selections

9. A data professional at an online retailer wants to understand the expected outcome of an upcoming sale. They perform a test that compares the means of one continuous dependent variable based on five groups of two categorical variables. What type of test does this scenario describe?

  • One-way analysis of variance
  • Two-way analysis of variance (CORRECT)
  • Post hoc test
  • T-test

10. What test performs a pairwise comparison between all available groups while controlling for the error rate?

  • Bias-variance test
  • Post hoc test (CORRECT)
  • Analysis of variance test
  • Chi-squared test

11. A data professional at an automotive manufacturer is asked to find a solution to a common manufacturing defect. They research the relationship between categorical and continuous variables to ensure all variables are relevant to the specific defect. What type of statistical technique do they use?

  • Analysis of covariance (CORRECT)
  • Analysis of variance
  • Analysis of independence
  • Analysis of regression

12. A data professional compares how two or more continuous variables vary according to categorical independent variables. What statistical technique are they using?

  • Analysis of variance
  • Analysis of variables
  • Multivariate analysis of variance (CORRECT)
  • Mean analysis of variables

13. Fill in the blank: The chi-squared goodness of fit test determines whether an observed _____ variable follows an expected distribution.

  • continuous
  • absolute
  • dependent
  • categorical (CORRECT)

14. A data analytics team wants to solve a problem about employee retention. They study the relationship between categorical and continuous variables to ensure all variables are relevant to the retention issues. What type of statistical technique do they use?  

  • Analysis of independence
  • Analysis of regression
  • Analysis of covariance (CORRECT)
  • Analysis of variance

15. Fill in the blank: When using _____, the independent variables must be categorical and the outcome variables must be continuous.

  • analysis of variance
  • multiple analysis of variables
  • multivariate analysis of variance (CORRECT)
  • analysis of variables

16. A researcher wants to evaluate the effectiveness of different job training programs on various skill outcomes. She has two continuous dependent variables: a technical skills score and a soft skills score. Her independent variable is the training program, which can be either in-person instruction or online instruction. What type of analysis should she use?

  • MANOVA (CORRECT)
  • ANCOVA
  • MANCOVA
  • ANOVA

17. A statistician wants to determine if weight loss differs significantly based on certain diets. His dependent variable is amount of weight lost (in kgs), and his independent variable is diet (vegan, low-carb, or omnivore). Which statistical test is most appropriate?

  • MANCOVA
  • 2-way ANOVA
  • 1-way ANOVA (CORRECT)
  • MANOVA

18. Fill in the blank: Analysis of variance examines the relationship between _____.

  • categorical and continuous variables (CORRECT)
  • dependent and independent variables
  • null and alternative variables
  • initial and second hypothesis variables

19. Fill in the blank: The chi-squared _____ of fit test determines whether an observed categorical variable follows an expected distribution.

  • Goodness (CORRECT)
  • variance
  • bias
  • independence

20. A junior data analyst at a fabric supplier works to identify the expected outcome of a new product introduction. They compare the means of one continuous dependent variable based on four groups of two categorical variables. What type of test does this scenario describe?

  • One-way analysis of variance
  • Post hoc test
  • T-test
  • Two-way analysis of variance (CORRECT)

21. Fill in the blank: The chi-squared test for independence determines whether _____ categorical variables are associated with each other.

  • two or more
  • any number of
  • three
  • two

Correct: The chi-squared test for independence determines whether two categorical variables are associated with each other.

22. Fill in the blank: Analysis of variance is a group of statistical techniques that test the difference of means between _____ groups.

  • three
  • an infinite number of
  • three or more (CORRECT)
  • two

Correct: Analysis of variance is a group of statistical techniques that test the difference of means between three or more groups. This is an extension of t-tests, which tests the means between several groups.

23. Covariates are the variables that are directly relevant to the question to be answered in an analysis of covariance test.

  • True
  • False (CORRECT)

Correct: Covariates are the variables that are not of direct interest to the question to be answered. Analysis of covariance, or ANCOVA, is a statistical technique that tests the difference of means between three or more groups while controlling for the effects of covariates.

CONCLUSION – Advanced Hypothesis Testing

In conclusion, the journey through this section has equipped participants with an advanced understanding of statistical hypothesis testing, emphasizing the application of Chi-squared and analysis of variance (ANOVA) tests. The acquired knowledge empowers participants to navigate diverse data types and employ sophisticated statistical techniques to draw meaningful conclusions.

As participants engage in practical applications of these tests, they not only refine their statistical skills but also enhance their ability to make data-driven decisions in real-world scenarios. This comprehensive exploration enhances the participants’ capabilities as data professionals, providing them with valuable tools to contribute meaningfully to data analysis and decision-making processes.