COURSE 4: THE POWER OF STATISTICS

Module 5: Introduction to Hypothesis Testing

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Introduction to Hypothesis Testing

In this course, you will delve into the realm of hypothesis testing, a crucial tool for data professionals seeking to discern the statistical significance of test or experiment results against chance occurrences. This comprehensive overview will guide you through the fundamental steps integral to any hypothesis test, providing you with a structured understanding of the process.

By exploring the intricacies of hypothesis testing, you will gain valuable insights into how this analytical technique empowers data professionals to derive meaningful conclusions from data, ensuring a robust foundation for informed decision-making. This course will equip you with the knowledge and skills needed to navigate hypothesis testing confidently, enabling you to contribute effectively to data-driven insights in your professional journey.

Learning Objectives

  • Use Python to conduct a hypothesis test
  • Describe how to conduct a two-sample hypothesis test
  • Describe how to conduct a one-sample hypothesis test
  • Explain the difference between a type I error and a type II error
  • Define concepts related to hypothesis testing such as significance level and p-value
  • Understand the role of statistical significance in hypothesis testing
  • Explain the difference between the null hypothesis and the alternative hypothesis

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: INTRODUCTION TO HYPOTHESIS TESTING

1. Fill in the blank: The _____ typically assumes that observed data does not occur by chance.

  • subjective hypothesis 
  • alternative hypothesis (CORRECT) 
  • null hypothesis  
  • objective hypothesis 

Correct: The alternative hypothesis typically assumes that observed data does not occur by chance. The alternative hypothesis is a statement that contradicts the null hypothesis. It is accepted as true only if there is convincing evidence for it. 

2. Which of the following statements describe significance level? Select all that apply.  

  • Significance level is the threshold at which a result is considered to be due to chance.  
  • Significance level is the probability of rejecting an alternative hypothesis when it is true. 
  • Significance level is the threshold at which a result is considered statistically significant. (CORRECT) 
  • Significance level is the probability of rejecting a null hypothesis when it is true. (CORRECT) 

Correct: Significance level is the threshold at which a result is considered statistically significant. It is also the probability of rejecting a null hypothesis when it is true. 

3. What concept refers to the probability of observing results that are at least as extreme as those observed when the null hypothesis is true?  

  • P-value (CORRECT) 
  • Statistical significance 
  • Confidence level  
  • Z-score 

Correct: P-value refers to the probability of observing results that are at least as extreme as those observed when the null hypothesis is true.

4. A data professional conducts a hypothesis test. They mistakenly conclude that their result is statistically significant when it actually occurred by chance. What type of error does this scenario describe?  

  • Type I (CORRECT) 
  • Type II 
  • Type III 
  • Type IV 

Correct: This scenario describes a type I error. A type 1 error, also known as a false positive, occurs when a null hypothesis is rejected that is actually true.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ONE-SAMPLE TESTS

1. In a one-sample hypothesis test, what does the null hypothesis state?  

  • The population mean is not equal to an observed value. 
  • The population mean is equal to an observed value. (CORRECT) 
  • The population mean is greater than an observed value.  
  • The population mean is less than an observed value.  

Correct: In a one-sample hypothesis test, the null hypothesis states that the population mean is equal to an observed value. 

2. A data professional conducts a hypothesis test. They discover that their p-value is less than the significance level. What conclusion should they draw?  

  • Reject the null hypothesis. (CORRECT)  
  • Reject the alternative hypothesis.  
  • Fail to reject the null hypothesis.  
  • Decide the test is inconclusive.

Correct: To draw a conclusion about a hypothesis test, compare the p-value with the significance level. If the p-value is less than the significance level, reject the null hypothesis. If the p-value is greater than the significance level, fail to reject the null hypothesis.  

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: TWO-SAMPLE TESTS

1. What does a two-sample hypothesis test determine?  

  • Whether a population parameter, such as a mean or proportion, is equal to a specific value
  • Whether a sample statistic, such as a mean or proportion, is equal to a specific value
  • Whether two population parameters, such as two means or two proportions, are equal (CORRECT)
  • Whether two sample statistics, such as two means or two proportions, are equal

Correct: A two-sample hypothesis test determines whether two population parameters, such as two means or two proportions, are equal.

2. What is the null hypothesis of a two-sample t-test?

  • The population mean is equal to an observed value
  • There is no difference between two population proportions
  • There is no difference between two population means (CORRECT)
  • The population proportion is equal to an observed value

Correct: In a two-sample t-test, the null hypothesis states that there is no difference between two population means.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: HYPOTHESIS TESTING WITH PYTHON

1. A data professional can use the Python function scipy.stats.ttest_ind() to compute the p-value for the two-sample t-test.

  • True (CORRECT)
  • False

Correct: A data professional can use the Python function scipy.stats.ttest_ind() to compute the p-value. P-value is the probability of observing a difference in sample means as or more extreme than the difference observed when the null hypothesis is true. The function scipy.stats.ttest_ind() enables a data professional to conduct a two-sample t-test.  

2. What arguments of the Python function scipy.stats.ttest_ind(a, b, equal_var) refer to observations from the sample data? Select all that apply.

  • alpha 
  • loc 
  • a (CORRECT)
  • b (CORRECT)

Correct: In the function scipy.stats.ttest_ind(a, b, equal_var), a refers to observations from the first sample; b refers to observations from the second sample; equal_var indicates whether the population variance of the two samples is assumed to be equal. 

MODULE 5 CHALLENGE

1. Which of the following statements accurately describes the null hypothesis? Select all that apply. 

  • The null hypothesis typically assumes that observed data does not occur by chance. 
  • The null hypothesis is accepted as true only if there is convincing evidence for it. 
  • The null hypothesis is assumed to be true unless there is convincing evidence to the contrary. (CORRECT)
  • The null hypothesis typically assumes that observed data occurs by chance. (CORRECT)

2. What term describes the probability of rejecting the null hypothesis when it is true?

  • P-value
  • Confidence interval
  • Alternative hypothesis
  • Significance level (CORRECT)

3. A data professional conducts a hypothesis test. They fail to reject the null hypothesis. What statement best describes their conclusion?

  • Their significance level is greater than their p-value
  • Their confidence level is greater than their p-value
  • Their p-value is greater than their significance level. (CORRECT)
  • Their p-value is greater than their confidence level

4. A data professional conducts a hypothesis test. When they draw their conclusion, they commit a type I error. Which of the following statements describe their error? Select all that apply.

  • They fail to reject a null hypothesis that is actually false.
  • They conclude their result occurred by chance when in fact it is statistically significant.
  • They reject a null hypothesis that is actually true. (CORRECT)
  • They conclude their result is statistically significant when in fact it occurred by chance. (CORRECT)

5.A data professional at an emergency response center conducts a hypothesis test to identify optimal ambulance routes. They just found the p-value. What should they do next? 

  • Choose the significance level
  • State the alternative hypothesis
  • State the null hypothesis
  • Reject or fail to reject the null hypothesis (CORRECT)

6. A data professional conducts a hypothesis test. They choose a significance level of 10%. They calculate a p-value of 12.4%. What conclusion should they draw?

  • Reject the alternative hypothesis.
  • Fail to reject the null hypothesis. (CORRECT)
  • Fail to reject the alternative hypothesis.
  • Reject the null hypothesis

7. A data professional is conducting a two-sample t-test. What does their alternative hypothesis state?

  • There is no difference between two population means.
  • There is a difference between two population proportions.
  • There is no difference between two population proportions.
  • There is a difference between two population means. (CORRECT)

8. A data professional conducts a hypothesis test to compare the mean annual sales of two different restaurants in the same restaurant chain. They write the following code:

scipy.stats.ttest_ind(a=530, b=550, equal_var=FALSE)

What does the argument equal_var=FALSE refer to? 

  • Whether or not the population variance of the two samples is assumed to be equal (CORRECT)
  • Significance level
  • P-value
  • Observations from the first sample

9. Which of the following statements accurately describe the null hypothesis? Select all that apply. 

  • The alternative hypothesis typically assumes that observed data occurs by chance.
  • The null hypothesis typically assumes that observed data does not occur by chance.
  • The null hypothesis typically assumes that observed data occurs by chance. (CORRECT) The alternative hypothesis typically assumes that observed data does not occur by chance. (CORRECT)

10. To draw a conclusion about the null hypothesis, what two concepts are compared?  

  • Confidence level and significance level
  • P-value and significance level (CORRECT)  
  • P-value and alternative hypothesis 
  • Alternative hypothesis and significance level

11. A data professional conducts a hypothesis test to compare the mean annual sales of two different restaurants in the same restaurant chain. They write the following code:

scipy.stats.ttest_ind(a=530, b=550, equal_var=FALSE) 

What does the argument a=530 refer to? 

  • Whether or not the population variance of the two samples is assumed to be equal
  • Significance level
  • P-value
  • Observations from the first sample (CORRECT)

12. What is the term for the arbitrary threshold determining whether an observed difference between groups occurred by chance?

  • P-value
  • Maximum likelihood
  • Statistical significance (CORRECT)
  • Confidence level

13. A data professional conducts a hypothesis test. When they draw their conclusion, they fail to reject a null hypothesis, which is actually false. What type of error do they commit?

  • Type I
  • Type III
  • Type II (CORRECT)
  • Type IV

14. A data professional conducts a hypothesis test. They choose a significance level of 5%. They calculate a p-value of 3.3%. What conclusion should they draw?

  • Reject the alternative hypothesis.
  • Fail to reject the null hypothesis.
  • Reject the null hypothesis. (CORRECT)
  • Fail to reject the alternative hypothesis.

15. In a one-sample hypothesis test of the mean, what are the typical options for the alternative hypothesis? Select all that apply.

  • The population mean is equal to an observed value.
  • The population mean is greater than an observed value. (CORRECT)
  • The population mean is less than an observed value. (CORRECT)
  • The population mean is not equal to an observed value. (CORRECT)

16. A data professional conducts a hypothesis test. They choose a significance level of 1%. They calculate a p-value of 0.01%. What conclusion should they draw?

  • Fail to reject the null hypothesis.
  • Reject the alternative hypothesis.
  • Fail to reject the alternative hypothesis.
  • Reject the null hypothesis. (CORRECT)

17. A data professional is conducting a hypothesis test. Their null hypothesis states that there is no difference between two population proportions. What type of test are they conducting?

  • Two-sample z-test (CORRECT)
  • Two-sample t-test
  • One-sample z-test
  • One-sample t-test

18. What does the concept of p-value refer to?  

  • The probability of observing results as or more extreme than those observed when the null hypothesis is true (CORRECT)
  • The probability of observing results less extreme than those observed when the null hypothesis is true
  • The probability of rejecting the null hypothesis when it is false
  • The probability of rejecting the null hypothesis when it is true

19. When would a data professional reject the null hypothesis?

  • When their test statistic is less than their p-value
  • When their significance level is less than their p-value
  • When their p-value is less than their test statistic
  • When their p-value is less than their significance level (CORRECT)

20. A data professional on a marketing team conducts a hypothesis test to compare the mean time customers spend on two different versions of a company’s website. To start, they state the null hypothesis and the alternative hypothesis. What should they do next?

  • Reject or fail to reject the null hypothesis
  • Find the margin of error
  • Choose a significance level (CORRECT)
  • Find the p-value

21.  A data professional conducts a hypothesis test to compare the mean annual sales of two different restaurants in the same restaurant chain. They write the following code:

scipy.stats.ttest_ind(a=530, b=550, equal_var=FALSE)

What does the argument b=550 refer to? 

  • Observations from the second sample (CORRECT)
  • P-value
  • Whether or not the population variance of the two samples is assumed to be equal
  • Significance level

22. A data professional conducts a hypothesis test. When they draw their conclusion, they commit a type II error. Which of the following statements accurately describe this scenario? Select all that apply.

  • They have made an error known as a false positive.
  • They have made an error known as a false negative. (CORRECT)
  • They have failed to reject a null hypothesis, which is actually false. (CORRECT)
  • They concluded their result occurred by chance, but it was actually statistically significant. (CORRECT)

23. A data analytics team in the landscaping industry conducts a hypothesis test to compare the effects of certain fertilizers on flower production. To start, they state the null hypothesis and the alternative hypothesis. Then they choose a significance level. What should they do next?

  • Reject or fail to reject the null hypothesis
  • Find the p-value (CORRECT)
  • Select the sample data
  • Identify the confirmed assumption

24. What type of hypothesis typically assumes that observed data does not occur by chance?  

  • Type II  
  • Alternative (CORRECT)
  • Null
  • Type I

25. The null hypothesis is a statement that is assumed to be true unless there is convincing evidence to the contrary. The null hypothesis typically assumes that observed data occurs by chance.  

  • True (CORRECT)
  • False

Correct: The null hypothesis is a statement that is assumed to be true unless there is convincing evidence to the contrary. The null hypothesis typically assumes that observed data occurs by chance.

26. What is the first step when conducting a hypothesis test?  

  • Find the p-value 
  • Reject or fail to reject the null hypothesis 
  • Choose a significance level  
  • State the null hypothesis and the alternative hypothesis (CORRECT) 

Correct: The first step when conducting a hypothesis test is to state the null hypothesis and alternative hypothesis. The following three steps are to choose a significance level, find the p-value, and either reject or fail to reject the null hypothesis.  

27. To compare two population means, a data professional uses a one-sample hypothesis test. 

  • True 
  • False (CORRECT)

Correct: To compare two population means, a data professional uses a two-sample hypothesis test. A two-sample test determines whether two population parameters, such as two means, are equal. A one-sample test determines whether a population parameter is equal to a specific value. 

CONCLUSION to Introduction to Hypothesis Testing

In conclusion, this course serves as a pivotal exploration into the world of hypothesis testing, offering a comprehensive understanding of its fundamental principles and applications for data professionals. By acquiring proficiency in the basic steps of hypothesis testing, learners are well-equipped to ascertain the statistical significance of experimental outcomes, distinguishing them from random chance. The insights gained from this course empower individuals to draw meaningful conclusions from data, enhancing their ability to contribute to informed decision-making processes.

As data professionals, the skills honed in hypothesis testing become essential tools in the analytical toolkit, ensuring a robust and evidence-based approach to addressing business challenges. This course marks a significant milestone in the journey of data professionals, providing them with the expertise needed to navigate hypothesis testing with confidence and precision.