META MARKETING ANALYTICS PROFESSIONAL CERTIFICATE
Course 3: Statistics for Marketing
Week 4: Data Modeling
Coursera Study Guide
Click to Enroll in Coursera Meta Marketing Analytics Professional Certificate
CONTENT
- PRACTICE QUIZ: STATISTICAL MODELING
- PRACTICE QUIZ: SIMPLE LINEAR REGRESSION
- PRACTICE QUIZ: CLUSTER ANALYSIS
- PRACTICE QUIZ: TIME SERIES
- GRADED QUIZ: STATISTICAL MODELING
This week you’ll be introduced to various model families and how to create them using Tableau. You’ll also learn how to interpret the results of these models. You’ll complete the fourth and final part of your capstone project, which you will be submitting as a peer review next week.
Learning Objectives
- Define various model families: regression, clustering, time series
- Evaluate appropriate model family given an appropriate context or situation
- Interpret the results and outputs of various models
- Create basic statistical models for regression, clustering, and time series using Tableau
- Understand the basic assumptions, use cases, and limitations of regression, cluster, and time series models
PRACTICE QUIZ: STATISTICAL MODELING
1. What is statistical analysis?
- I. Equations used to analyze data.
- II. Field of study based around the use of data.
- III. Processes used to analyze data.
- All three. (CORRECT)
- I.
- II.
- III.
Correct: Good job! When considered in the general academic sense, statistical analysis is the field of study around the use of data. In practice, it usually refers to the equations or processes involved with analyzing data.
2. What is statistical modeling?
- The equations or processes used in analyzing data.
- The field of study based around the use of data.
- The application of a statistical analysis to data. (CORRECT)
- It is the result of analyzing data with a model.
Correct: Good job! At first it may seem like this is the same definition as statistical analysis (at least in part). However, it is different. Statistical analysis refers to the tools used to analyze data, while statistical modeling is the implementation of the analysis to the data.
3. What is a statistical model?
- The equations used to analyze data.
- The field of study based around the use of data.
- The results from statistical modeling. (CORRECT)
- The application of a statistical analysis.
Correct: Good job! A statistical model is the result or outcome of applying a statistical analysis to data. The application itself is statistical modeling.
4. True or false: Time series analysis includes a time factor.
- True (CORRECT)
- False
Correct: Good job! As the name suggests, a time series analysis will involve time in some manner. Typically, this will be as an independent variable.
5. What are the two versions of machine learning?
- Unsupervised and clustering
- Regression and classification
- Regression and clustering
- Supervised and unsupervised (CORRECT)
Correct: Good job! These are the two versions of machine learning that are commonly used.
6. What type of machine learning would be suitable for data that is unlabeled?
- Supervised
- Regression
- Unsupervised (CORRECT)
- Categorical
Correct: Good job! Unsupervised learning is appropriate to use on unlabeled data. Labeled data will use supervised machine learning methods.
7. Regression and Classification models fall under which type of machine learning?
- Supervised (CORRECT)
- Unsupervised
Correct: Good job! Regression and classification models are supervised machine learning.
PRACTICE QUIZ: SIMPLE LINEAR REGRESSION
1. In simple linear regression, what does the word “linear” mean?
- That the regression is easy to perform.
- That there is only one independent variable. (CORRECT)
- That the independent and dependent variables are not related.
- That a line is used to relate the independent and dependent variables.
Correct: Good job! Graphically, a linear relationship describes a line. Thus, in simple linear regression, the model that is produced by the regression will have the graph of a line.
2. What does the r2 in a linear regression mean?
- It is the accuracy of the model.
- It is the slope of the line created by the regression.
- It tells you how good the model is.
- It is a measure of how much of the variance is explained by the independent variable. (CORRECT)
Correct: Good job! In simple terms, this coefficient measures how much changes in the independent variable influence changes in the dependent variable.
3. What is the residual?
- It is the set of data values that are close to the regression line.
- It is the set of outliers in the data.
- It is the difference between the recorded data value and the predicted data value. (CORRECT)
- It is the collection of data that was not used to construct the model.
Correct: Good job! The residual is often referred to as the error in the regression model. It measures how far off the data value is from what the regression model predicts.
4. Which of the following is one of the assumptions in simple linear regression?
- I. Linearity
- II. Randomness
- III. Maximum sample size
- I. (CORRECT)
- II.
- III.
- All of these.
Correct: Good job! The assumption of linearity in the model is one of the basic assumptions of simple linear regression.
5. Regression and classification models are similar in that they both:
- Predict categorical variables.
- Use dependent variables to predict independent variables.
- Predict numerical variables.
- Use independent variables to predict dependent variables. (CORRECT)
Correct: Good job! Both regression and classification models predict dependent variables using independent variables. Classification models predict categorical variables, while regression models predict numerical variables.
6. True or false: Naive Bayes is not a type of classification model.
- False (CORRECT)
- True
Correct: Good job! Naive Bayes is a type of classification model.
PRACTICE QUIZ: CLUSTER ANALYSIS
1. What is market segmentation?
- It is the process of creating sub-groups in a customer base using common traits or needs. (CORRECT)
- It is the use of statistical models on a target population.
- It is the belief that markets are diverse.
- It is the assumption that sub-groups of the population have the same likelihood of occurring.
Correct: Good job! Market segmentation can help an analyst understand the different audiences in the sample. This can help to create targeted marketing strategies.
2. How is clustering analysis useful to a marketing analyst?
- It homogenizes the sample.
- It determines what marketing strategy to employ.
- It determines if a marketing strategy is effective.
- It facilitates market segmentation. (CORRECT)
Correct: Good job! This is one of the big advantages of clustering analysis. It offers insight into the different audiences in the sample which can allow for more targeted marketing strategies.
3. What clustering method is often considered the default method?
- Mean shift.
- K-means. (CORRECT)
- Hierarchical.
- Density-based spatial.
Correct: Good job! K-means clustering is usually the first method attempted in clustering analysis. Beware, though, as it is not always the best.
4. What is the assumption of homogeneity of variance in K-means clustering?
- That every cluster has the same likelihood of occurring.
- That the clusters are approximately elliptical, or round.
- That the minimum sample size is 50 times the number of clusters.
- That all the variables in the analysis have similar variance. (CORRECT)
Correct: Good job! In order for K-means clustering to be reliable, we must assume that no variables differ by much in their variance.
PRACTICE QUIZ: TIME SERIES
1. What does a time series analysis do?
- It shortens the length of time to complete a statistical analysis.
- It evaluates a qualitative variable to see how it changes in time.
- It evaluates a quantitative variable to see how it changes in time. (CORRECT)
- It tracks how the independent variable changes with time.
Correct: Good job! Time series analysis examines a dependent variable and its evolution over time.
2. What is the purpose, or most common use, for time series analysis?
- It keeps track of how long an analysis takes to complete.
- It records the past values of the dependent variable.
- To predict, or forecast, future values of the dependent variable. (CORRECT)
- It identifies different sub-groups in the data that share similar traits.
Correct: Good job! Time series analysis uses past values of the dependent variable to try and predict what future values will be. This is also known as forecasting.
3. Which of the following is NOT an assumption in time series analysis?
- I. Dependence
- II. Independence
- III. Minimum sample size
- III.
- I.
- All three.
- II. (CORRECT)
Correct: Good job! Independence is not an assumption in time series analysis. Incidentally, independence and dependence are mutually exclusive—you can’t have both.
4. True or false: The minimum sample size in time series analysis is 50.
- True
- False (CORRECT)
Correct: Good job! Unlike many other models we’ve encountered so far, time series analysis does not have a set minimum sample size. The minimum sample size depends on what time units you are using.
5. True or false: When time is measured in years, the minimum sample size in a time series analysis is 12.
- True (CORRECT)
- False
Correct: Good job! Unlike many other models we’ve encountered so far, time series analysis does not have a set minimum sample size. The minimum sample size depends on what time units you are using, and for data collected annually, it’s a minimum of 12 years.
GRADED QUIZ: STATISTICAL MODELING
1. Consider the assumptions in simple linear regression. What is the assumption of normality?
- The data creates a normal distribution. (CORRECT)
- All independent variables have similar variance.
- The minimum sample is 20.
- The independent variables do not affect one another.
Correct: Good job! Most of the time in statistical modeling, normality refers to the data’s distribution being a normal distribution.
2. Which one of the following is a modeling technique in clustering analysis?
- I. Logistic regression
- II. Interpolation
- III. Fuzzy C-Means.
- III. (CORRECT)
- II.
- I.
- All of these.
Correct: Good job! Fuzzy C-Means is a commonly used technique used in clustering analysis.
3. In time series analysis, the assumption that the data values all come from the same source is known as:
- dependence. (CORRECT)
- minimum sample size.
- constant time.
- stationarity.
Correct: Good job! Dependence is the opposite of independence which we have discussed in the past. This assumption requires that the source of the observations does not change.
4. In time series analysis, you can convert longer units of time into shorter units of time.
- False. (CORRECT)
- True.
Correct: Good job! While you can always convert shorter units of time into longer ones in time series, you cannot do the reverse.
5. What is a classification model?
- A model that predicts a qualitative variable. (CORRECT)
- A model that predicts numerical data.
- A model that predicts the amount of something.
- A model that predicts a quantitative variable.
Correct: Good job! A qualitative variable is not numerical, rather it is categorical. That is, it is a label. Classification models work with this type of variable.
6. What is the assumption of sphericity for K-means clustering?
- That the dataset can’t be clustered.
- That the entire dataset is elliptical or round.
- That the clusters are approximately elliptical or round. (CORRECT)
- That the clusters meet the minimum size.
Correct: Good job! The K in K-means refers to the number of clusters. Fifty times this is the minimum sample size.
7. In time series there are three assumptions: minimum sample size, dependence, and:
- stationarity. (CORRECT)
- homogeneity.
- independence.
- normality.
Correct: Good job! In mathematics and statistics, stationarity is an advantageous characteristic to have in your data. It allows for a host of statistical techniques, one being times series.
8. In time series analysis, what is the assumed minimum sample size if time is measured in months?
- 50 months of information. (CORRECT)
- 6 months of information.
- 100 months of information.
- 12 months of information.
Correct: Good job! The minimum sample size in time series is not fixed like most of the other analytic methods we’ve discussed. It varies based on the size of the time units being used.
9. Which of the following is NOT an assumption in simple linear regression?
- I. Independence
- II. Homogeneity of variance
- III. Sphericity
- All of them are.
- III. (CORRECT)
- I.
- II.
Correct: Good job! Sphericity is an assumption in clustering analysis, not simple linear regression.
10. Which of the following is a type of quantitative variable?
- Categorical.
- Independent.
- Continuous. (CORRECT)
- Dependent.
Correct: Good job! Continuous variables can take on all values in a given interval. Discrete variables can take on values from a given list.
11. Which one of the following is a modeling technique in clustering analysis?
- I. Simple linear regression
- II. K-means
- III. Logistic regression.
- I.
- II. (CORRECT)
- III.
- All of them.
Correct: Good job! K-means is one of the most popular clustering methods in statistical analysis
12. In time series analysis, one of the independent variables must be time.
- False
- True. (CORRECT)
Correct: Good job! Time series—as the name implies—depends on time. Consequently, one of the independent variables must be time.
13. In time series, what is the assumption of stationarity?
- That the minimum sample size is 50.
- That the mean value of the series is constant. (CORRECT)
- That the data values come from the same source.
- That the data follows a normal distribution.
Correct: Good job! In mathematics and statistics, stationarity is an advantageous characteristic to have in your data. It allows for a host of statistical techniques, one being times series.
14. In time series analysis, what is the assumed minimum sample size if time is measured in years?
- 5 years of information.
- 1 year of information.
- 50 years of information.
- 25 years of information. (CORRECT)
Correct: Good job! The minimum sample size in time series is not fixed like most of the other analytic methods we’ve discussed. It varies based on the size of the time units being used.
15. Which of the following is an assumption in simple linear regression?
- I. Independence
- II. Sphericity
- III. Dependence
- I. (CORRECT)
- II.
- III.
- All of them are.
Correct: Good job! Homogeneity means the same or very similar in mathematics. So, having the variance be similar across all variables is homogeneity of variance.
16. Which one of the following is NOT a modeling technique in clustering analysis?
- I. Fuzzy C-Means
- II. K-Means
- III. General linear models
- All of these.
- III. (CORRECT)
- I.
- II.
Correct: Good job! General linear models are not used in clustering analysis.
17. In time series analysis, two of the three assumptions are minimum sample size and stationarity. What is the third?
- Dependence. (CORRECT)
- Normality.
- Homogeneity.
- Constant time.
Correct: Dependence is the opposite of independence which we have discussed in the past. This assumption requires that the source of the observations does not change.
18. What is a regression model?
- A model that uses numbers as labels.
- A model that predicts categorical data.
- A model that predicts a qualitative variable.
- A model that predicts a quantitative variable. (CORRECT)
Correct: Good job! Regression models are used when you wish to predict variables that measure the amount of something.
19. Continuous and discrete are the two types of what kind of variable?
- Independent.
- Quantitative. (CORRECT)
- Qualitative.
- Categorical.
Correct: Good job! Continuous variables can take on all values in a given interval. Discrete variables can take on values from a given list.
20. Consider the assumptions in simple linear regression. What is the assumption of homogeneity of variance?
- Some independent variables have no variance.
- All independent variables have similar variance. (CORRECT)
- The data creates a normal distribution.
- The independent variables do not affect one another.
Correct: Good job! Homogeneity means the same or very similar in mathematics. So, having the variance be similar across all variables is homogeneity of variance.
21. In time series analysis, what does the assumption of dependence mean?
- That time must be one of the independent variables.
- That the dependent variable does not change over time.
- That all the observations come from the same place. (CORRECT)
- That the dependent variable depends on the independent variable.
Correct: Good job! Dependence is the opposite of independence which we have discussed in the past. This assumption requires that the source of the observations does not change.
22. In time series analysis, what is the assumption that the mean value of the series is constant called?
- Minimum sample size.
- Constant time.
- Dependence.
- Stationarity (CORRECT)
Correct: Good job! In mathematics and statistics, stationarity is an advantageous characteristic to have in your data. It allows for a host of statistical techniques one being times series.
23. Which of the following is an assumption in simple linear regression?
- I. Linearity
- II. Variability
- III. Dependence
- All of them are.
- III.
- II.
- I. (CORRECT)
Correct: Good job! Linearity means that the graph of the regression will be a line.
24. Consider the assumptions in simple linear regression. What is the assumption of minimum sample size?
- The minimum sample is 10.
- The minimum sample is 20. (CORRECT)
- The minimum sample is 50.
Correct: Good job! Simple linear regression is a popular model choice, and the low minimum sample size is one of the attractive features of it.e minimum sample is 100.
25. What is the assumption of minimum sample size for K-means clustering?
- 50 times the number of clusters. (CORRECT)
- 10 times the number of clusters.
- 100 times the number of clusters.
- 20 times the number of clusters.
Correct: Good job! The K in K-means refers to the number of clusters. Fifty times this is the minimum sample size.
26. In time series analysis, what is the assumed minimum sample size if time is measured in quarters?
- 4 quarters of information.
- 50 quarters of information.
- 40 quarters of information. (CORRECT)
- 100 quarters of information.
Correct: Good job! The minimum sample size in time series is not fixed like most of the other analytic methods we’ve discussed. It varies based on the size of the time units being used.
27. Quantitative variables consist of two types. What are they?
- Known and unknown.
- Continuous and discrete. (CORRECT)
- Continuous and smooth.
- Countable and discrete.
Correct: Good job! Continuous variables can take on all values in a given interval. Discrete variables can take on values from a given list.
28. In time series analysis, you can convert shorter time units into longer time units.
- False
- True. (CORRECT)
Correct: Good job! You can always convert shorter units of time into longer ones, but you cannot do the opposite.
29. What is the assumption equal prior probability for K-means clustering?
- That each cluster has the same probability of happening. (CORRECT)
- That one of the clusters is more likely to occur than the others.
- That each data value has the same probability of occurring.
- That each cluster has the same number of data points.
Correct: Good job! It is assumed in K-means that each of the groups, or clusters, is no more likely to occur than any other.
30. What is the difference between a classification model and a regression model?
- A classification model predicts a qualitative variable, while a regression model predicts a quantitative variable. (CORRECT)
- A classification model is used with qualitative data, while a regression model works with categorical data.
- A classification model is used with numerical data, while a regression model works with qualitative variables.
- A classification model predicts a quantitative variable, while a regression model predicts a qualitative variable.
Correct: Good job! This is a fundamental difference between the two. With very few exceptions, the models in either group cannot crossover.
31. A Classification Model is…
- A statistical model created using a classification analysis (CORRECT)
- The results of a regression analysis.
- A type of analysis.
- The process of applying a classification analysis.
Correct: Exactly! This would be the result of modeling with a classification analysis.
32. Data modeling is useful in marketing analytics because…
- You can predict the number of clicks you will get from an advertising campaign based on how much you spent on it.
- You can predict how much you will sell next year.
- You can segment your market to better understand the needs of specific parts of your customer base.
- These are all correct. (CORRECT)
Correct: Exactly! These are all useful tools for a marketing analyst.
33. Which of the following is NOT a model?
- Cluster
- Progression (CORRECT)
- Classification
- Regression
Correct: Exactly! This is not a model.
34. You can convert a time variable that is recorded in weeks to days. True or False?
- True
- False (CORRECT)
Correct: Exactly! You cannot convert a larger unit of time into a smaller unit of time.
35. Independent Variable: Quantitative
Dependent Variable: Quantitative
Purpose: Predict the Dependent Variable using the Independent Variable
Which model is most appropriate for the conditions listed above?
- Cluster Analysis
- Classification Analysis
- Time Series Analysis
- Simple Linear Regression (CORRECT)
Correct: Exactly! If you are predicting a dependent variable using an independent variable, you will use a simple linear regression model.
36. If the Calla and Ivy ad received 5,000 impressions, how many times will it be clicked? What type of model will best answer this question?
- Simple Linear Regression (CORRECT)
- Time Series
- Cluster
- Classification
Correct: Exactly! You will be predicting a quantitative dependent variable using a quantitative independent variable. Simple Linear Regression is most appropriate.
37. How many sales can Carlos from Inu + Neko expect to receive from emailed ads next year? What type of model will best answer this question?
- Cluster
- Time Series (CORRECT)
- Classification
- Simple Linear Regression
Correct: Exactly! Since you are trying to forecast a dependent variable using time as your independent variable, time series is the most appropriate model.
38. What type of model will help Paola forecast customer spending based on how long they have been customers?
- Classification
- Cluster
- Time Series (CORRECT)
- Simple Linear Regression
Correct: Exactly! You are predicting customer purchases based on time. This describes a time series analysis.
Subscribe to our site
Get new content delivered directly to your inbox.
Quiztudy Top Courses
Popular in Coursera
- Meta Marketing Analytics Professional Certificate.
- Google Digital Marketing & E-commerce Professional Certificate.
- Google UX Design Professional Certificate.
- Meta Social Media Marketing Professional Certificate
- Google Project Management Professional Certificate
- Meta Front-End Developer Professional Certificate
Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!