META MARKETING ANALYTICS PROFESSIONAL CERTIFICATE

Course 3: Statistics for Marketing

Week 1: Descriptive Statistics

Coursera Study Guide

Click to Enroll in Coursera Meta Marketing Analytics Professional Certificate

CONTENT

This week you’ll get an overview of the Statistics for Marketing course and you will learn the basics of Descriptive Statistics and when to use them. You will also be introduced to Bayesian statistics. You will also get an overview of your capstone project and at the end of the week you will complete part one.

Learning Objectives

  • Explain Descriptive Statistics: mean, median, mode, standard deviation, distribution
  • Explain the use cases of Descriptive Statistics: Central Tendency, Measures of Dispersion, Frequency Tables
  • Determine basic statistics using Google Sheets or Spreadsheets
  • Define the fundamentals of Bayesian thinking

PRACTICE QUIZ: MEASURES OF CENTRAL TENDENCY

1. Which of the following is NOT a measure of central tendency?

  • Median
  • Variance (CORRECT)
  • Mean
  • Mode

Correct: Good job! The variance is not a measure of central tendency. Rather, it is a measure of dispersion, that is, how spread out the data is.

2. Based on the video, what are measures of central tendency?

  • They are numbers that tell you how accurate the data is.
  • They are numbers that represent how spread out the data is in a dataset.
  • They identify which number in the dataset is the middle one.
  • They are numbers that represent the “middle” of a dataset. (CORRECT)

Correct: Good job! Measures of central tendency are a way of describing the “middle” of a dataset. Because datasets can be vastly different, the methods for determining their middles can be different as well. The three most common measures are the mean, median, and mode.

3. How might a marketer find value in knowing the middle of a dataset?

  • I. It can help predict future sales numbers.
  • II. It can help assess the impact of marketing efforts.
  • III. It can provide a reasonable baseline for what to expect from a particular demographic.
  • II and III. (CORRECT)
  • I and III.
  • III only.
  • I, II, and III.

Correct: Good job! In practice, a great many data values will fall close to the middle of a given dataset. This can be very useful if you want to avoid either over- or underestimating a characteristic of the population (which is often the case). They can also provide insight into how impactful marketing efforts are.

4. What is the mean of the following set of numbers?

  • 1, 4, 3, 10, 12
  • The mean is 3.
  • The mean is 4.
  • The mean is 6. (CORRECT)
  • The mean is 30.

Correct: Good job! The mean is calculated by adding up all the values in the set and dividing by the number of values that you added together. In this case, the sum of these five numbers is 30. Dividing this by 5 gets us 6 as the mean.

5. Which measure of central tendency would be the most reasonable for defining the middle of the dataset below?

  • -2, 4, 1, -1, 3, 73
  • Mean, median, or mode is suitable.
  • The median. (CORRECT)
  • The mean.
  • The mode.

Correct: Good job! The median is the most suitable measure of central tendency for finding the middle of this set of numbers. The value of 73 is much larger than the rest of the data values and will pull the mean value to the right. The median, on the other hand, will not be skewed by a small proportion of outliers.

6. Calculate the median for the following set of numbers.

  • 1, 2, 2, 3, 14, 2
  • The median is 2. (CORRECT)
  • The median is 1.
  • The median is 4.
  • The median is 14.

Correct: Good job! The median is the middle data value in the sorted dataset. In this case, there are an even number of data values so the median is the average of the middle two values in the sorted dataset. Since both of these middle values are 2, their average is also 2.

7. In a spreadsheet, the following formula will find the mean of the numbers in the cells E200 to E350.

=AVERAGE(E200:E350)
  • T​rue (CORRECT)
  • False

Correct: Good job! This is indeed the correct syntax for the formula. The parentheses enclose only the E200:E350 part of the formula, the formula leads off with a “=” sign, and the command AVERAGE is used.

8. What is the mode for the following set of numbers?

3, 1, 10, 9, 2, 5

  • There is no mode. (CORRECT)
  • The mode is 4.
  • The mode is 5.
  • The mode is 10.

Correct: Good job! Since none of these numbers are repeated there is no mode.

9. What is an advantage the median has over the mean?

  • It better represents the middle for large datasets.
  • It better represents a typical value when the values in the dataset are all close to each other.
  • It handles negative numbers better.
  • The median is insensitive to outliers. (CORRECT)

Correct: Good job! The mean is highly sensitive to outliers. That indicates that a few numbers much larger (or smaller) than the rest have a greater pull on where the mean falls. The median does not have this issue.

10. What is the mean of the following set of numbers?

  • 1, 2, 2, 3, 14, 2
  • The mean is 2.
  • The mean is 24.
  • The mean is 14.
  • The mean is 4. (CORRECT)

Correct: Good job! The mean is calculated by adding up all the values in the set and dividing by the number of values that you added. In this case, the sum of these six numbers is 24. Dividing this by 6 gets us 4 as the mean.

11. In a spreadsheet, the following formula will find the mode of the numbers in the cells A1 to A1000.

=(MODE A1:A1000)
  • T​rue
  • False (CORRECT)

Correct: Good job! This is indeed not the correct syntax for the formula. While most of it is correct, the parentheses should enclose only the A1:A1000 part of the formula, so the formula would look like this: =MODE(A1:A1000).

12. Calculate the median for the following set of numbers.

-4, 1, 7, -2, 48, 22, -11

  • The median is 1. (CORRECT)
  • The median is 48.
  • The median is 0.
  • The median is -2.

Correct: Good job! When this list is sorted, the number in the middle will be 1. Therefore 1 is the median.

13. In a spreadsheet, the following formula will find the mean of the numbers in the cells D2 to D20.

AVERAGE(D2:D20)
  • T​rue
  • False (CORRECT)

Correct: Good job! The formula is almost correct. In order to be correct, the formula should lead off with a “=” sign.

14. Which of the following is/are measure(s) of central tendency?

  • Variance
  • Mean
  • Median
  • Mode
  • Mean, Mode, and Median (CORRECT)
  • Variance, Median, Mode
  • Variance, Mean, Mode
  • Variance, Mean, Median

Correct: Good job! Mean, mode, and median are all measures of central tendency.

15. What is the mode(s) for the following set of numbers?

7, -7, 4, 0, -1, -7, 0

  • The mode is 0.
  • The modes are -7 and 0. (CORRECT)
  • The mode is -7.
  • There is no mode.

Correct: Good job! As this answer implies, the mode of a dataset does not have to be unique. It is possible for a dataset to have multiple modes. Such a dataset is said to be multimodal. This dataset is called bimodal since there are two modes.

PRACTICE QUIZ: MEASURES OF DISPERSION

1. How can a z-score be useful?

  • I. It can help in determining if a data value is high or low in the population.
  • II. It tells you how many standard deviations from the mean a data value is.
  • III. It can be used to determine data outliers.
  • II and III
  • All three are ways a z-score can be useful. (CORRECT)
  • I and III
  • I and II

Correct: Nice job. The z-score can be used to assess each of these three characteristics about a data value pulled from a dataset. To be sure, not all of them may be of interest, but often it is the case that you will want to know one of these if you pull a data value or receive a new data value.

2. Measures of variation is an alternative name for measures of central tendency.

  • True.
  • False (CORRECT)

Correct: Good job! Measures of variation and measures of central tendency are not the same thing. Whereas measures of central tendency give an idea of the middle of a data set, measures of variation give an idea of how spread out the data set is.

3. The formula for finding the z-score for a data value is

z = (value – mean)/std.
  • True (CORRECT)
  • False

Correct: Nice job. We calculate a z-score for a data value by subtracting the mean from the data value and then dividing the result by the standard deviation.

4. What is the purpose of a measure of variation?

  • It provides an indication of how spread out the data is. (CORRECT)
  • It gives an idea of where the middle of the data is.
  • It determines the size of the dataset.
  • It determines the accuracy of the data.

Correct: Good job! Measures of variation are functions that ascribe numbers to the dataset with the intent of measuring how spread out the dataset is.

5. Which of the following are examples of measures of variation?

  • I. Range
  • II. Standard deviation
  • III. Mean
  • All three.
  • I and II (CORRECT)
  • II and III
  • I and III

Correct: Good job! Each one of these is a type of measure of variation that we discussed in this lesson. They all give an indication of how spread out the data is.

6. What is the range for the following dataset?

2, 3, -5, 8, 0, -2

  • -5
  • 8
  • 13 (CORRECT)
  • 5.5

Correct: Good job! The range is the difference between the maximum data value and the minimum data value.

7. Suppose that you are given the following data. Would the range be a reasonable measure of the spread in the data? Why or why not?

2, 1, 5, 10, 6, 45

  • Yes, it is reasonable because there are no negative numbers in the dataset.
  • Yes, it is reasonable because the range is positive.
  • No, it is not reasonable because the range is too large.
  • No, it is not reasonable because there is an outlier. (CORRECT)

Correct: Good job! Even though the range can be calculated for this dataset, it will give a misleading idea for the spread of the data. All the values, with the exception of 45, are between 1 and 10, which is a range of 9. When 45 is included, the range becomes 44. This latter calculation suggests that the data is more spread out than it really is.

8. For a normal distribution how many data values should you expect to fall within two standard deviations of the mean?

  • 13.6%
  • 95.2% (CORRECT)
  • 68%
  • 47.6%

Correct: Nice job. 95.2% of the data values should fall within two standard deviations of the mean.

9. Suppose a dataset has a standard deviation of 4 and a mean of 3. What is the z-score for a data value of 6?

  • The z-score is -1
  • The z-score is 0.75 (CORRECT)
  • The z-score is 1.0.
  • The z-score is -0.75

Correct: Nice job. To find the z-score for a data value you just use the formula z = (value – mean)/std.

10. What is the range for the following dataset?

-7, 1, -3, -2, 4, -1

  • -7
  • 0
  • 4
  • 11 (CORRECT)

Correct: Good job! The range is the difference between the maximum data value and the minimum data value.

GRADED QUIZ: DESCRIPTIVE STATISTICS

1. What is the formula for finding the standard deviation in a spreadsheet? (Assume that the data is in the cells A1 to A100.)

  • =STDEV(A1:A100) (CORRECT)
  • =STD(A1:A100)
  • =(DEV A1:A100)
  • =STANDARDDEVIATION(A1:A100)

Correct: Exactly! To find the standard deviation in a spreadsheet, you would use =STDEV() with the selected cells in the parentheses.

2. What is the range for the given dataset?

23, 20, 31, 11, 15, 19

  • 23
  • 11
  • -20
  • 20 (CORRECT)

Correct: Good job! The range is the difference between the maximum and minimum values in the dataset.

3. What is the mode for the following set of numbers?

5, -1, 3, 8, 2, 1, 0

  • There is no mode. (CORRECT)
  • The mode is 5.
  • The mode is -1.
  • The mode is 1.

Correct: Good job! The mode of a dataset is the most common data value. But this dataset does not have any repeated values.

4. In a normal distribution, what percentage of data values are found above the mean?

  • 34%
  • 95.2%
  • 68%
  • 50% (CORRECT)

Correct: Good job. For a normal distribution, half of the data values are found above the mean. This is the entire area under the bell curve to the right of the mean.

5. What is the median for the given dataset?

1, 3, 7, 5, 3

  • The median is 4.
  • The median is 3. (CORRECT)
  • The median is 5.
  • The median is 7.

Correct: Good job! The median of a dataset with an odd number of data values is the middle number when the data is sorted.

6. True or false: The mean is sensitive to outliers.

  • True (CORRECT)
  • False

Correct: Good job! Unlike the mean, the median is often better to use than the mean since it does not change in the presence of outliers.

7. What is the mean of the given set of numbers?

-1, 3, 6, 5, -3

  • The mean is 2. (CORRECT)
  • The mean is 3.
  • The mean is 6.
  • The mean is 0.

Correct: Good job! The mean is the sum of all the data values divided by the number of data values in the dataset.

8. What does a measure of variation tell you about a dataset?

  • I.  How spread out the data is
  • II.  Where the center of the data is
  • III.  How large the dataset is
  • IV.  All of these
  • II
  • IV
  • I (CORRECT)
  • III

Correct: Good job! Measures of variation describe how spread out the dataset’s values are. The center of the dataset is described by measure of central tendency and the size of the dataset is given by the number of values it contains.

9. Which of the following is not how a marketer can use a measure of central tendency.

  • I.   Identifying sales frequency
  • II.  Predicting customer behavior
  • III. Risk analysis
  • I
  • II
  • III (CORRECT)
  • None of these

Correct: Good job! A marketer may be able to use measures of central tendency to identify sales frequency and in predicting customer behavior. However, they are not used in risk analysis. That is one of the purposes of measures of variation.

10. Suppose a dataset has a standard deviation of 2 and a mean of 7. What is the z-score for a data value of 3?

  • The z-score is -0.75
  • The z-score is 0.60.
  • The z-score is 2.0.
  • The z-score is -2.0 (CORRECT)

Correct: Nice job. To find the z-score for a data value you just use the formula z = (value – mean)/std.

11. What is the mode for the following set of numbers?

  • 6, 1, 4, 0, 1, 1, 0
  • The mode is 0.
  • There is no mode.
  • The mode is 1. (CORRECT)
  • The mode is 4.

Correct: Good job! The mode of a dataset is the most common data value.

12. What is the median for the given dataset?

  • 19, 42, 33, 15, 21
  • The median is 33.
  • The median is 15.
  • The median is 21. (CORRECT)
  • The median is 45.

Correct: Good job! The median of a dataset with an odd number of data values is the middle number when the data is sorted.

13. True or false: The median is sensitive to outliers.

  • False (CORRECT)
  • True

Correct: Good job! A measure of central tendency provides a number that can be interpreted as the middle of the dataset.

14. What is the mean of the given set of numbers?

10, 20, 30, -15, -25, -20

  • The mean is -25.
  • The mean is 30.
  • The mean is 10.
  • The mean is 0. (CORRECT)

Correct: Correct! As discussed, these four parts can be used to create good and compelling stories to help build impactful stories.

15. How can a marketer use a measure of variation?

  • I.    Data reliability
  • II.   Range targeting
  • III.   Risk analysis
  • IV.  All of these
  • I
  • II
  • IV (CORRECT)
  • III

Correct: Good job! Measures of variation can be extremely useful for a marketer in all three of these ways.

16. What do measures of central tendency tell you about a dataset?

  • I.   They represent how spread out the data is
  • II.  They represent the middle of the dataset
  • III. They tell you how large the dataset is
  • I
  • III
  • II (CORRECT)
  • I, II, and III

Correct: Good job! A measure of central tendency provides a number that can be interpreted as the middle of the dataset.

17. What is the median for the given dataset?

1, -2, 3, -1, 2, -3

  • The median is -3.
  • The median is 3.
  • The median is 1.
  • The median is 0. (CORRECT)

Correct: Good job! The median of a dataset with an even number of data values is the average of the two middle numbers when the data is sorted.

18. True or false: The median is preferred for datasets that have outliers

  • True (CORRECT)
  • False

Correct: Good job! Unlike the mean, the median is often better to use than the mean since it does not change in the presence of outliers.

19. What is the mean of the given set of numbers?

4, 7, 3, 1, 2, 1

  • The mean is 0.
  • The mean is 2.5.
  • The mean is 3. (CORRECT)
  • The mean is 1.

Correct: Good job! The mean is the sum of all the data values divided by the number of data values in the dataset.

20. What is the formula for finding the median of a dataset in a spreadsheet? (Assume that the data is in the cells A1 to A100.)

  • MEDIAN(A1:A100)
  • =MEDIAN_A1:A100
  • =MEDIAN(A1:A100) (CORRECT)
  • =MDN(A1:A100)

Correct: Good job. This is the correct formula to find the median in a spreadsheet if the data is contained in cells A1 through A100.

21. What is the mode for the following set of numbers?

26, 11, 45, 0, 7, 7, 0

  • The mode is 45.
  • The mode is 0.
  • The modes are 7 and 0. (CORRECT)
  • The mode is 7.

Correct: Good job! The mode of a dataset is the most common data value. It is possible for a dataset to have more than one mode.

23. Which of the following is the correct formula for Median?

  • MEDIAN(A1:A3)
  • =MEDIAN A1:A3
  • =MEDIAN(A1:A3) (CORRECT)

Correct: Exactly! This is the formula you will use to find the median.

24. What is the range of the following set of numbers?

  • 24, 11, 13, 26, 18, 21, 15
  • 13
  • -15
  • 17
  • 1​5 (CORRECT)

Correct: Exactly! Since the max is 26, you would subtract the min, which is 11, to get a range of 15.

25. What is a popular cut-off for outliers?

  • Anything less than 3 standard deviations away from the mean.
  • Anything more than 3 standard deviations away from the mean. (CORRECT)
  • Anything more than 2 standard deviations away from the mean.
  • Anything more than 1 standard deviation away from the mean.

Correct: Exactly! Anything beyond 3 standard deviations is often considered an outlier.

26. The equation for z-score is…

  • = (Value-Standard Deviation) / Mean
  • = ZSCORE(A1 : A3)
  • = (Mean – Value) / Standard Deviation
  • = (Value-Mean) / Standard Deviation (CORRECT)

Correct: Exactly! Z-score is the value you want to test minus the mean and divided by the standard deviation.

27. Contingency tables look at how many categorical variables?

  • One categorical variable.
  • Two or more categorical variables. (CORRECT)
  • No more than two categorical variables
  • None.  Contingency tables do not look at categorical variables

Correct: Exactly! Contingency tables require at least two categorical variables, but they can hold more.

28. A correlation coefficient of -0.65 is considered…

  • Medium Correlation (CORRECT)
  • High Correlation
  • Low Correlation
  • No Correlation

Correct: Exactly! A number between -0.5 and -0.7 or between 0.5 and 0.7 is considered a medium correlation.