COURSE 4: THE POWER OF STATISTICS

Module 2: Probability

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Probability

Throughout this module, participants will delve into the fundamental principles of probability, laying the groundwork for a comprehensive understanding of its application in data analysis. The exploration begins with a focus on the basic rules governing the calculation of probability for single events, providing participants with a solid conceptual foundation. As the module progresses, attention turns to the sophisticated methods employed by data professionals, with a spotlight on Bayes’ theorem. This advanced probabilistic tool enables the description and analysis of more complex events, offering participants a nuanced perspective on probability in data contexts.

A key highlight of this module is the exploration of probability distributions, including the binomial, Poisson, and normal distributions. Participants will unravel the intricacies of these distributions, gaining insights into how they can be employed to discern the structural patterns within data sets. By the conclusion of this module, learners will not only have mastered the basic rules of probability but will also possess the analytical skills to navigate and interpret the diverse probability distributions crucial for a comprehensive understanding of data structures. This comprehensive overview ensures that participants are well-equipped to apply probability concepts effectively in the realm of data analysis.

Learning Objectives

  • Use Python to model data with a probability distribution
  • Describe the significance and use of z-scores
  • Define the Empirical Rule
  • Describe the features and uses of continuous probability distributions such as the normal distribution
  • Describe the features and uses of discrete probability distributions such as the binomial and Poisson distributions
  • Explain the difference between discrete and continuous random variables
  • Describe Bayes’ theorem and its applications
  • Define dependent events
  • Describe conditional probability and its applications
  • Define different types of events such as mutually exclusive and independent events
  • Apply basic rules of probability such as the complement, addition, and multiplication rules
  • Describe basic probability in mathematical terms
  • Explain the difference between objective and subjective probability

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: BASIC CONCEPTS OF PROBABILITY

1. Objective probability is based on personal feeling, experience, or judgment.

  • True
  • False (CORRECT)

Correct: Subjective probability is based on personal feeling, experience, or judgment. Objective probability is based on statistics, experiments, and mathematical measurements.

2. Fill in the blank: In statistics, a number between _____ is used to express the probability that an event will occur.

  • -1 and 1
  • 0 and 1 (CORRECT)
  • -1 and 0 1 and 2

Correct: The probability that an event will occur is expressed as a number between 0 and 1. If the probability of an event equals 0, there is a 0% chance that the event will occur. If the probability of an event equals 1, there is a 100% chance that the event will occur.

3. The probability of no snow tomorrow equals 1 minus the probability of snow tomorrow. This is an example of what rule of probability?

  • Division rule
  • Complement rule (CORRECT)
  • Multiplication rule
  • Addition rule

Correct: This is an example of the complement rule, which states that the probability that event A does not occur is 1 minus the probability of A. In statistics, the complement of an event is the event not occurring.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: CONDITIONAL PROBABILITY

1. What is conditional probability?

  • The probability of two events occurring at the same time
  • The probability of an event occurring given that another event has already occurred (CORRECT)
  • The probability of a single random event occurring
  • The probability of a highly unlikely event occurring

Correct: Conditional probability refers to the probability of an event occurring given that another event has already occurred.

2. Suppose two events occur: The first event is drawing an ace from a standard deck of playing cards, and the second event is drawing another ace from the same deck. Note that the first ace is not reinserted into the deck after it is drawn. What term is used to describe these two events?

  • Subjective
  • Dependent (CORRECT)
  • Objective
  • Independent

Correct: These two events are described as dependent because drawing the first ace changes the probability of drawing the second ace. Two events are dependent if the occurrence of one event changes the probability of the other.

3. Fill in the blank: _____ probability is the updated probability of an event based on new data.

  • Empirical
  • Classical
  • Posterior (CORRECT)
  • Prior

Correct: Posterior probability is the updated probability of an event based on new data. It is calculated by updating the prior probability using Bayes’ theorem.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DISCRETE PROBABILITY DISTRIBUTIONS

1. Which of the following statements describe continuous random variables? Select all that apply.

  • Continuous random variables are typically whole numbers.
  • Continuous random variables are typically negative numbers;
  • Continuous random variables are typically decimal values. (CORRECT)
  • Continuous random variables take all the possible values in some range of numbers. (CORRECT)

Correct: Continuous random variables take all the possible values in some range of numbers. Typically, these are decimal values that can be measured, such as height, weight, or time.

2. What probability distribution represents experiments with repeated trials that each have two possible outcomes: success or failure?

  • The trinomial distribution
  • The Poisson distribution
  • The binomial distribution (CORRECT)
  • The normal distribution

Correct: The binomial distribution represents experiments with repeated trials that each have two possible outcomes: success or failure.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: CONTINUOUS PROBABILITY DISTRIBUTIONS

1. The normal distribution has which of the following features? Select all that apply.

  • The total area under the curve equals 4
  • The shape is a bell curve (CORRECT)
  • The curve is symmetrical on both sides of the center (CORRECT)
  • The mean is located at the center of the curve (CORRECT)

Correct: The normal distribution has the following features: the shape is a bell curve, the mean is located at the center of the curve, and the curve is symmetrical on both sides of the center. The normal distribution is the most common probability distribution in statistics because so many different kinds of datasets display a bell-shaped curve.

2. What does the empirical rule state?

  • For a dataset with a normal distribution, 68% of values fall within 1 standard deviation of the mean, 95% of values fall within 2 standard deviations of the mean, and 99.7% of values fall within 3 standard deviations of the mean. (CORRECT)
  • For a dataset with a normal distribution, 50% of values fall within 1 standard deviation of the mean, 30% of values fall within 2 standard deviations of the mean, and 20% of values fall within 3 standard deviations of the mean.
  • For a dataset with a normal distribution, 100% of values fall within 1 standard deviation of the mean.
  • For a dataset with a normal distribution, 33.3% of values fall within 1 standard deviation of the mean, 33.3% of values fall within 2 standard deviations of the mean, and 33.3% of values fall within 3 standard deviations of the mean.

Correct: The empirical rule states that, for a dataset with a normal distribution, 68% of values fall within 1 standard deviation of the mean, 95% of values fall within 2 standard deviations of the mean, and 99.7% of values fall within 3 standard deviations of the mean.

3. A data value is 2 standard deviations above the mean. What is its z-score?

  • 0
  • -2
  • 2 (CORRECT)
  • 1

Correct: Its z-score is 2. A z-score of 2 is 2 standard deviations above the mean. Z-score is a measure of how many standard deviations below or above the population mean a data point is.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: PROBABILITY DISTRIBUTIONS WITH PYTHON

1. A data professional is working with a dataset that has a normal distribution. To test out the empirical rule, they want to find out if roughly 68% of the data values fall within 1 standard deviation of the mean. What Python functions will enable them to compute the mean and standard deviation?

  • mn() and std()
  • mn() and stand()
  • mean() and standard()
  • mean() and std() (CORRECT)

Correct: To compute the mean, they would use the mean() function; to compute the standard deviation, they would use the std() function.

2. What Python function is used to compute z-scores for data?

  • stats.zscore() (CORRECT)
  • mean.zscore()
  • median.zscore()
  • normal.zscore()

Correct: The Python function stats.zscore() is used to compute z-scores for data. This function is part of the stats module in the SciPy package.

QUIZ: MODULE 2 CHALLENGE

1. A data professional is working for a large corporation. The marketing team asks them to predict the success of a new ad campaign. To make an informed prediction, they use statistics to analyze data on past ad campaigns. What type of probability are they using?

  • Dependent
  • Independent
  • Objective (CORRECT)
  • Subjective

2. The probability of an event is close to 1. Which of the following statements best describes the likelihood that the event will occur?

  • The event is unlikely to occur.
  • The event is certain to occur.
  • The event is certain not to occur.
  • The event is likely to occur. (CORRECT)

3. The probability of rain tomorrow is 40%. What is the probability of the complement of this event?

  • The probability of no rain tomorrow is 80%.
  • The probability of no rain tomorrow is 20%.
  • The probability of no rain tomorrow is 60%. (CORRECT)
  • The probability of no rain tomorrow is 40%.

4. Fill in the blank: Two events are _____ if the occurrence of one event does not change the probability of the other event.

  • continuous
  • independent (CORRECT)
  • discrete
  • dependent

5. Fill in the blank: To calculate posterior probability, a data professional can use _____ to update the prior probability based on the data.

  • the normal distribution
  • Bayes’s theorem (CORRECT)
  • the binomial distribution
  • the complement rule

6. Which of the following statements accurately describes a key difference between discrete and continuous random variables?

  • Discrete random variables are typically decimal values that can be measured; continuous random variables are typically whole numbers that can be counted.
  • Discrete random variables are typically whole numbers that can be counted; continuous random variables are typically decimal values that can be measured. (CORRECT)
  • Discrete random variables are positive numbers; continuous random variables are negative numbers.
  • Discrete random variables are negative numbers; continuous random variables are positive numbers.*

7. The Poisson distribution can model which of the following kinds of data? Select all that apply.

  • The number of heads in 10 fair coin tosses
  • The number of calls per hour at a call center (CORRECT)
  • The number of visitors per day on a website (CORRECT)
  • The number of customers per week at a retail store (CORRECT)

8. A data professional working for a smartphone manufacturer is analyzing sample data on the weight of a specific smartphone. The data follows a normal distribution, with a mean weight of 150g and a standard deviation of 10g. According to the empirical rule, approximately what percentage of the data values lie between 140g and 160g?

  • 95%
  • 50%
  • 68% (CORRECT)
  • 99.7%

9. A data value has a z-score of 2.5. Where is it located?

  • 2.5 standard deviations below the median
  • 2.5 standard deviations above the median
  • 2.5 standard deviations below the mean
  • 2.5 standard deviations above the mean (CORRECT)

10. A data analytics team at a water utility works with a dataset that contains information about local reservoirs. They determine that the data follows a normal distribution. What Python function can they use to compute z-scores for the data? 

  • mean.zscore()
  • describe()
  • median.zscore()
  • stats.zscore() (CORRECT)

11. Fill in the blank: The _____ distribution best models the number of heads in 10 fair coin flips.

  • Bernoulli
  • Poisson
  • Binomial (CORRECT)
  • Normal

12. If all outcomes of an event are equally likely, how is its probability calculated?

  • Divide the number of desired outcomes by the total number of possible outcomes. (CORRECT)
  • Divide the total number of possible outcomes by the number of desired outcomes.
  • Divide the total number of certain outcomes by the number of possible outcomes.
  • Divide the total number of possible outcomes by the number of certain outcomes.

13. A coin is tossed twice. To calculate the probability of getting two heads in a row, which of the following equations should be used?

  • ½ ÷ ½
  • ½ * ½ (CORRECT)
  • ½ + ½
  • ½ – ½

14. Which of the following events are mutually exclusive? Select all that apply.

  • Getting heads on a first coin toss and tails on a second coin toss
  • Getting a 4 on a first die roll and a 6 on a second die roll
  • Getting heads and tails on the same coin toss (CORRECT)
  • Getting a 4 and a 6 on the same die roll (CORRECT)

15. What concept refers to the probability of an event before new data is collected?

  • Prior probability (CORRECT)
  • Subjective probability
  • Conditional probability
  • Posterior probability

16. Which of the following are examples of continuous random variables? Select all that apply.

  • The number of students in a math class
  • The height of a redwood tree (CORRECT)
  • The time it takes for a person to run a race (CORRECT)
  • The weight of a polar bear (CORRECT)

17. A data professional working for a smartphone manufacturer is analyzing sample data on the weight of a specific smartphone. The data follows a normal distribution, with a mean weight of 150g and a standard deviation of 10g. What data value lies 3 standard deviations below the mean?

  • 160g
  • 120g (CORRECT)
  • 130g
  • 180g

18. The mean and the standard deviation of a standard normal distribution always equal what values?

  • Mean = 2; standard deviation = 1
  • Mean = 0; standard deviation = 2
  • Mean = 1; standard deviation = 2
  • Mean = 0; standard deviation = 1 (CORRECT)

19. A data professional is analyzing sales data for a retail store. The data follows a normal distribution. What Python function can they use to compute z-scores for the data?

  • stats.zscore() (CORRECT)
  • median.zscore()
  • mean.zscore()
  • normal.zscore()

20. A first coin toss results in tails, and a second coin toss results in heads. What concept best describes these two events?

  • Subjective
  • Non-random
  • Independent (CORRECT)
  • Dependent

21. What concept refers to the probability of an event occurring given that another event has already occurred?

  • Classical probability
  • Conditional probability (CORRECT) 
  • Subjective probability
  • Empirical probability

22. Which of the following are examples of discrete random variables? Select all that apply.

  • The length of an airplane
  • The time it takes to drive from one city to another city
  • The number of radios produced in a factory each day (CORRECT)
  • The number of rooms in a hotel (CORRECT)

23. What probability distribution can model the probability of getting a certain number of defective products in a sample of 15 products?

  • Binomial distribution (CORRECT)
  • Normal distribution
  • Standard normal distribution
  • Poisson distribution

24. If a data value has a z-score of 0, what does the value equal?

  • The median
  • The standard deviation
  • The mean (CORRECT)
  • The mode

25. An investor believes there is a 90% chance that the price of a certain stock will increase in the next year. The investor’s prediction is based exclusively on intuition. What type of probability are they using?

  • Subjective (CORRECT)
  • Empirical
  • Objective
  • Classical

26. A six-sided die is rolled. To find the probability of rolling either a one or a three, what rule of probability should be used?

  • Addition rule (CORRECT)
  • Division rule
  • Complement rule
  • Multiplication rule

27. A jar contains four marbles: Two marbles are red, one is green, and one is blue. One marble is taken from the jar. What is the probability that the marble is blue?

  • 100%
  • 50%
  • 25% (CORRECT)
  • 75%

28. A data professional working for a smartphone manufacturer is analyzing sample data on the weight of a smartphone. The data follows a normal distribution, with a mean weight of 150g and a standard deviation of 10g. What data value lies at the center of the distribution curve?

  • 160g
  • 140g
  • 10g
  • 150g (CORRECT)

29. If the probability of an event equals 1, what is the chance that the event will occur?

  • 1%
  • 10%
  • 50%
  • 100% (CORRECT)

Correct: If the probability of an event equals 1, there is a 100% chance that the event will occur. Probability is expressed as a number between 0 and 1. If the probability of an event is close to zero, there is a small chance that it will occur. If the probability is close to 1, there is a strong chance that it will occur.

30. Fill in the blank: The addition rule states that, if the events A and B are ____, then the probability of A or B happening is the sum of the probabilities of A and B.

  • mutually inclusive
  • mutually exclusive (CORRECT)
  • highly likely
  • highly unlikely

Correct: The addition rule states that, if the events A and B are mutually exclusive, then the probability of A or B happening is the sum of the probabilities of A and B. Two events are mutually exclusive if they cannot occur at the same time.

31. Fill in the blank: Two events are _____ if the occurrence of one event changes the probability of the other event.

  • independent
  • dependent (CORRECT)
  • subjective
  • objective

Correct: Two events are dependent if the occurrence of one event changes the probability of the other event.

32. What does Bayes’s theorem enable data professionals to calculate?

  • Interquartile range
  • Standard deviation
  • Mean
  • Posterior probability (CORRECT)

Correct: Bayes’s theorem enables data professionals to calculate posterior probability, or the updated probability of an event based on new data.

33. Fill in the blank: A _____ random variable has a countable number of possible values.

  • classical
  • subjective
  • discrete (CORRECT)
  • continuous

Correct: A discrete random variable has a countable number of possible values.

34. Fill in the blank: The binomial distribution models the probability of events with _____ possible outcomes.

  • four
  • two (CORRECT)
  • five
  • three

Correct: The binomial distribution models the probability of events with two possible outcomes.

35. The Poisson distribution can model the probability that a certain number of events will occur during a specific time period.

  • True (CORRECT)
  • False

Correct: The Poisson distribution can model the probability that a certain number of events will occur during a specific time period.

36. What shape is the graph of a normal distribution?

  • Triangular
  • Rectangular
  • Bell-shaped (CORRECT)
  • Square

Correct: The normal distribution is a continuous probability distribution that is symmetrical on both sides of the mean and bell-shaped. It is often called the bell curve because its graph has the shape of a bell, with a peak at the center and two downward sloping sides.

37. What is the z-score of a data value equal to the mean?

  • 2
  • 1
  • 0 (CORRECT)
  • 3

Correct: The z-score is 0 if the data value is equal to the mean. A z-score is a measure of how many standard deviations below or above the population mean a data point is.

CONCLUSION – probability

In conclusion, this module serves as a comprehensive journey into the world of probability, providing participants with a robust foundation for navigating the intricacies of data analysis. From mastering the fundamental rules governing single-event probability to delving into advanced methods like Bayes’ theorem, participants acquire a diverse toolkit for probabilistic analysis. The exploration of probability distributions, including the binomial, Poisson, and normal distributions, further enhances their analytical capabilities, enabling a nuanced understanding of data structures.

As participants conclude this module, they not only grasp the theoretical underpinnings of probability but also possess practical skills to apply these principles in diverse data scenarios. The module’s emphasis on real-world relevance ensures that learners are well-prepared to decipher complex events, make informed data-driven decisions, and contribute effectively to the realm of data analysis. With a solid understanding of probability and its applications, participants are now equipped to tackle the challenges of data interpretation and analysis with confidence and precision.