COURSE 4: THE POWER OF STATISTICS

Module 1: Introduction to Statistics

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Introduction to Statistics


This comprehensive module takes participants on a structured journey through the intricate realm of probability, starting with a foundational understanding of basic rules governing single-event probability. Building on this groundwork, the module progresses to the exploration of more complex events, introducing participants to advanced methods such as Bayes’ theorem. This section equips learners with the tools to articulate and analyze intricate scenarios, fostering a deeper understanding of probability in diverse contexts.

The module’s emphasis on probability distributions, including key distributions like binomial, Poisson, and normal distributions, provides participants with a robust framework for comprehending the inherent structure of various data sets. By combining theoretical insights with practical applications, the module ensures that learners not only grasp the fundamental principles of probability but also develop the skills necessary for effective data analysis. Ultimately, participants emerge with the expertise needed to make informed, data-driven decisions, positioning them to contribute meaningfully to the field of data analysis. This module serves as a comprehensive guide, bridging theory and application, and empowering participants to navigate the complexities of probability within the realm of data analysis.

Learning Objectives

  • Use Python to compute descriptive statistics
  • Determine measures of relative position such as percentile, quartile, and interquartile range
  • Determine measures of dispersion such as range, variance, and standard deviation
  • Determine measures of central tendency such as mean, median, and mode
  • Explain the relationship between parameter and statistic in inferential statistics
  • Explain the relationship between population and sample in inferential statistics
  • Explain the difference between descriptive statistics and inferential statistics

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: THE ROLE OF STATISTICS IN DATA SCIENCE

1. A data professional is analyzing real estate data. To estimate the mean rent of all the apartments in a large city, they calculate the mean rent of a random sample of 100 apartments. Which of the following best describes this statistical method?

  • Inferential statistics (CORRECT)
  • A/B testing
  • Data cleaning
  • Descriptive statistics

Correct: This statistical method is inferential statistics, which makes inferences about a population based on a sample of the data.

2. In statistics, a population can only include people.  

  • True
  • False (CORRECT)

Correct: In statistics, a population can include people, objects, events, or measurements. 

3. The mean weight of an entire population of elephants is an example of which of the following concepts?

  • Measure of dispersion
  • Parameter (CORRECT)
  • Data visualization
  • Statistic

Correct: The mean weight of an entire population of elephants is an example of a parameter. A parameter is a characteristic of a population.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DESCRIPTIVE STATISTICS

1. A data professional is analyzing sales data for an online store. The most frequently occurring value in the dataset is $150. What term is used to describe this value?

  • Mode (CORRECT)
  • Interquartile range
  • Median
  • Variance

Correct!

2. What do measures of dispersion represent?

  • The relative position of the values in a dataset
  • The center of a dataset
  • The total number of values in a dataset
  • The spread of a dataset (CORRECT)

Correct: Measures of dispersion represent the spread of a dataset, or how spread out the values are from the center. 

3. Which of the following descriptive statistics are measures of position? Select all that apply.

  • Standard deviation
  • Mean
  • Percentile (CORRECT)
  • Quartile (CORRECT)

Correct: Measures of position include quartile and percentile. A quartile divides the values in a dataset into four equal parts. A percentile is the value below which a percentage of data falls. 

Correct: Measures of position include quartile and percentile. A quartile divides the values in a dataset into four equal parts. A percentile is the value below which a percentage of data falls.  

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: CALCULATE STATISTICS WITH PYTHON

1. What two Python functions can you use to compute the range of your dataset?

  • mean() and min()
  • max() and std()
  • max() and min() (CORRECT)
  • max() and median()

Correct: To compute the range of your dataset, you subtract the minimum value from the maximum value. The max() function returns the maximum value. The min() function returns the minimum value. Range = max() – min().

2. What Python function can data professionals use to compute the mean, median, and standard deviation all at once?

  • std()
  • median()
  • mean()
  • describe() (CORRECT)

Correct: Data professionals can use the describe() function to compute the mean, median, and standard deviation all at once.

QUIZ: MODULE 1 CHALLENGE

1. A community college wants to improve student engagement with their new class schedule. They send a text alert to all students with a link to the same webpage. But half of the students get a text with information about the professors, and half get a text with information about newly available class times. What does this scenario describe?

  • Time series analysis
  • Hypothesis testing
  • Regression analysis
  • A/B testing (CORRECT)

Correct!

2. Which of the following statements correctly describe key elements of inferential statistics? Select all that apply.

  • Sample size has minimal impact on the validity of test results.
  • A statistical population may refer to people, objects, or events. (CORRECT)
  • Data professionals use inferential statistics to predict behaviors. (CORRECT)
  • A sample is a subset of the larger population. (CORRECT)

Correct!

3. A data team at a high-tech manufacturer wants to better understand customer purchases of webcams over the past five years. Their dataset contains about 3.5 million rows of data about different customers and webcam products. The data team uses summary statistics to better understand the data. What does this scenario describe?

  • Inferential statistics
  • Statistical significance
  • Confidence intervals
  • Descriptive statistics (CORRECT)

Correct!

4. Fill in the blank: A _____ is a characteristic of a population.

  • sample
  • parameter (CORRECT)
  • measure
  • range

Correct!

5. A data professional working at an online store analyzes data for a monthly business intelligence report. They calculate the average time customers spend on the store’s website. What descriptive statistic are they using?

  • Range
  • Mean (CORRECT)
  • Mode
  • Standard deviation

Correct!

6. A data professional works with the following dataset: 2, 2, 4, 7, 10. What is the mean of the dataset?

  • 4
  • 5 (CORRECT)
  • 10
  • 2

Correct!

7. What concept best describes the standard deviation, variance, and range?

  • Measures of central tendency
  • Measures of frequency
  • Measures of dispersion (CORRECT)
  • Measures of position

Correct!

8. A data professional is analyzing wind speed data. Their dataset includes daily speeds in miles per hour over six months: 1, 8, 9, 14, 22, 28, 35, 46, 55, 60, 71. What is the range of their dataset?

  • 31.7
  • 28
  • 70 (CORRECT)
  • 349

Correct!

9. A data professional is analyzing data about annual work income in dollars. They divide the data into quartiles: Q1 = $40,000, Q2 = $55,000, Q3 = $70,000. What percentage of the values in their dataset are above $70,000?

  • 5%
  • 50%
  • 25% (CORRECT)
  • 75%

Correct!

10. If you apply the describe() function to numerical data, the results will include which of the following descriptive statistics? Select all that apply.

  • Range
  • Median (CORRECT)
  • Mean (CORRECT)
  • Standard deviation (CORRECT)

Correct!

11. A grocery delivery business wants to improve customer response rates for their company’s monthly postcard mailer. They send a postcard with the same information to all customers. But half of the customers get a headline about faster delivery speeds, and half get a headline about more delivery drivers available in their area. What does this scenario describe?

  • Regression analysis
  • Hypothesis testing
  • Time series analysis
  • A/B testing (CORRECT)

Correct!

12. Fill in the blank: A characteristic of a _____ is a parameter.

  • sample
  • measure
  • range
  • population (CORRECT)

Correct!

13. A data analytics team collects responses from a customer satisfaction survey that asked customers to rate their experience from 1 to 10. The analytics team arranges the values in the dataset from worst (1) to best (10). Then, they identify the middle value. What descriptive statistic are they using?

  • Mode
  • Minimum
  • Mean
  • Median (CORRECT)

Correct!

14. A data professional works with the following dataset: 2, 2, 4, 7, 10. What is the median of the dataset?

  • 5
  • 7
  • 2
  • 4 (CORRECT)

Correct!

15. A data professional is analyzing weather data. Their dataset includes daily rainfall in inches for the previous five days: 1, 2.4, 3.2, 5, 2.8. What is the range of their dataset?

  • 3.2
  • 5
  • 2.4
  • 4 (CORRECT)

Correct! 

16. A data professional is analyzing data about annual work income in dollars. They divide the data into quartiles: Q1 = $40,000, Q2 = $55,000, Q3 = $70,000. What value is the 50th percentile of their dataset?

  • $30,000
  • $40,000
  • $55,000 (CORRECT)
  • $70,000

Correct!

17. Which of the following statements correctly describes key elements of inferential statistics? Select all that apply.

  • Sample size has minimal effect on the validity of test results.
  • Data professionals use inferential statistics to predict behaviors. (CORRECT)
  • The dataset that a sample is drawn from is called the population. (CORRECT)
  • A sample can be used to draw conclusions about an entire population. (CORRECT)

Correct!

18. A data professional working for a water conservancy researches household water usage in a large city. Their dataset contains about 800,000 rows of data capturing how much water each household uses in a month. The data professional creates visualizations to quickly understand the data and create a summary for stakeholders. What does this scenario describe?

  • Statistical significance
  • Confidence intervals
  • Inferential statistics Descriptive statistics (CORRECT)

Correct!

19. A company conducts an employee satisfaction survey. Employees rate their work experience as unacceptable, average, good, or excellent. The most frequently occurring value in the survey is excellent. What descriptive statistics concept best describes this value?

  • Standard deviation
  • Mode (CORRECT)
  • Median
  • Mean

Correct!

20. Which of the following descriptive statistics are measures of dispersion? Select all that apply.

  • Percentile
  • Standard deviation (CORRECT)
  • Variance (CORRECT)
  • Range (CORRECT)

Correct!

21. A data professional is analyzing data about annual work income in dollars. They divide the data into quartiles: Q1 = $40,000, Q2 = $55,000, Q3 = $70,000. What is the interquartile range, or IQR, of their dataset?

  • $15,000
  • $30,000 (CORRECT)
  • $40,000
  • $55,000

Correct!

22. A data team at a car dealership wants to improve open rates for their company’s weekly email campaign. They send two versions of the weekly email. Half of the customers get a subject line about new car colors, and half get a subject line about new car interiors. What does this scenario describe?

  • Hypothesis testing
  • Regression analysis
  • A/B testing (CORRECT)
  • Time series analysis

Correct!

23. Fill in the blank: A _____ is a characteristic of a sample.

  • range
  • parameter
  • measure
  • statistic (CORRECT)

Correct!

24. What do measures of dispersion, such as range and standard deviation, help a data professional understand about their data?

  • Minimum value
  • Center
  • Spread (CORRECT)
  • Maximum value

Correct!

25. A data team at a landscaping company investigates the most weather-resistant tree species in Canada. Their dataset contains more than 1 million rows of data about different trees. The data team creates a table to better understand what the data reveals. What does this scenario describe?

  • Descriptive statistics (CORRECT)
  • Inferential statistics
  • Statistical significance
  • Confidence intervals

Correct!

26. A data professional works with the following dataset: 2, 2, 4, 7, 10. What is the mode of the dataset?

  • 10
  • 2 (CORRECT)
  • 7
  • 4

Correct!

27. A data professional is analyzing tomato growth data. Their dataset includes the circumference of tomatoes in millimeters: 40, 49, 50, 52, 66.3, 77.5, 78, 80. What is the range of their dataset?

  • 51
  • 500.3
  • 40 (CORRECT)
  • 62.5

Correct!

28. A data professional is analyzing data about the employees of a corporation. They want to compute the average age of all employees in the dataset. What Python functions can they use? Select all that apply.

  • max()
  • std()
  • mean() (CORRECT)
  • describe() (CORRECT)

Correct!

29. Which of the following statements correctly describe key elements of inferential statistics? Select all that apply.

  • Inferential statistics is the process of selecting a subset of data from a population.
  • Sampling is the process of selecting a subset of data from a population. (CORRECT) 
  • A population includes every possible element to be measured. (CORRECT)
  • Before conducting a test, data professionals choose the sample size. (CORRECT)

Correct!

30. If you apply the describe() function to categorical data, the results will include which of the following descriptive statistics?

  • Median
  • Mode (CORRECT)
  • Mean
  • Standard deviation

Correct!

31. Descriptive statistics enable data professionals to summarize the main features of a dataset.

  • True (CORRECT)
  • False

Correct: Descriptive statistics enable data professionals to summarize the main features of a dataset. They also describe the dataset so people can quickly understand large amounts of data.

32. Fill in the blank: The _____  is the average value in a dataset.

  • mode
  • sample
  • mean (CORRECT)
  • median

Correct: The mean is the average value in a dataset.

33. What descriptive statistic measures the spread of the values from the mean of a dataset?

  • Mode
  • Median
  • Range
  • Standard deviation (CORRECT)

Correct: Standard deviation measures the spread of the values from the mean of a dataset.

34. What measure of position divides the values in a dataset into four equal parts?

  • Decile
  • Quintile
  • Quartile (CORRECT)
  • Percentile

Correct: A quartile divides the values in a dataset into four equal parts.

CONCLUSION – probability

his module on probability has provided participants with a comprehensive and structured exploration of fundamental principles and advanced methods within the realm of probability theory. Starting with the basic rules for calculating single-event probability, the module progressively delved into the intricacies of more complex scenarios using sophisticated tools such as Bayes’ theorem. The focus on probability distributions, including significant ones like binomial, Poisson, and normal distributions, has equipped learners with a versatile toolkit for understanding the underlying structures of diverse data sets.

By merging theoretical concepts with practical applications, this module not only laid a solid foundation of knowledge but also fostered the development of practical skills essential for data analysis. Participants have gained the expertise needed to navigate intricate data scenarios, make informed decisions, and contribute meaningfully to the field of data analysis. This conclusive overview reflects the module’s commitment to providing a well-rounded understanding of probability, ensuring participants are well-prepared to apply these principles effectively in real-world data analysis scenarios.