COURSE 5: REGRESSION ANALYSIS: SIMPLIFY COMPLEX DATA RELATIONSHIPS

Module 1: Introduction to Complex Data Relationships

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Introduction to Complex Data Relationships

Throughout this section, participants will embark on a comprehensive journey into the realm of regression models, gaining proficiency in each crucial step of the process. Beginning with a thorough exploration of the foundational assumptions and interpretation techniques, learners will acquire the necessary skills to build robust regression models.

The focus extends to two principal types of regression – linear and logistic, providing participants with a nuanced understanding of how data professionals leverage each type to address diverse business challenges. By delving into real-world applications, participants will not only grasp theoretical concepts but also develop practical expertise in employing regression models for effective decision-making in a variety of business contexts.

Learning Objectives

  • Define logistic regression
  • Define link function
  • Define generalized linear model (GLM)
  • Determine possible use-cases for linear and logistic regression
  • Distinguish different kinds of data required for linear vs. logistic regression
  • Explain the need for a link function in a GLM
  • Describe a generalized linear model (GLM)
  • Define linear and logistic regression on a high-level
  • Describe positive and negative correlation
  • Explain PACE in regression modeling
  • Connect statistical concepts (distributions, sampling) to regression modeling
  • Connect EDA to regression models
  • Identify the importance of model assumptions, model validation, model construction, model evaluation, and model interpretation in regression modeling
  • Define regression model

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: PACE IN REGRESSION ANALYSIS

1. In regression modeling, which statement describes the PACE plan stage?

  • Building the regression model in a coding language
  • Preparing formal results and visualizations for stakeholders
  • Understanding the data in the context of a problem (CORRECT)
  • Examining data more closely to choose an appropriate model

Correct: In regression modeling, understanding the data in the context of a problem describes the PACE plan stage. During the plan stage, a data professional considers what data they have access to, how the data was collected, and what the business needs are.

2. In which PACE stage does a data professional initially check the model assumptions?

  • Analyze (CORRECT)
  • Execute
  • Construct
  • Plan

Correct: During the analyze stage, a data professional initially checks the model assumptions.

3. What three tasks typically occur during the PACE construct stage? Select all that apply.

  • Present the visualizations to stakeholders
  • Evaluate the model results (CORRECT)
  • Re-check and confirm the model assumptions (CORRECT)
  • Build the model (CORRECT)

Correct: A data professional builds the model, rechecks and confirms model assumptions, and evaluates results in the construct stage.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: LINEAR REGRESSION

1. What technique estimates the linear relationship between a continuous dependent variable and one or more independent variables?

  • Model validation
  • Causation 
  • Intercept 
  • Linear regression (CORRECT)

Correct: Linear regression estimates the linear relationship between a continuous dependent variable and one or more independent variables.

2. Which of the following statements accurately describe dependent and independent variables? Select all that apply.

  • The independent variable tends to vary based on the values of dependent variables.
  • The dependent variable is the variable the given model estimates. (CORRECT)
  • The dependent variable tends to vary based on the values of independent variables. (CORRECT)
  • Independent variables are also referred to as explanatory or predictor variables. (CORRECT)

Correct: The dependent variable is the variable the given model estimates. It tends to vary based on the values of independent variables. Independent variables are also referred to as explanatory or predictor variables.

3. What term describes an inverse relationship between two variables?

  • Intercept
  • Slope
  • Negative correlation (CORRECT)
  • Positive correlation

4. Fill in the blank: The goal of regression analysis is to use math to define the _____ between the sample X’s and Y’s in order to understand how the variables interact.

  • Independence
  • value
  • model
  • relationship (CORRECT)

Correct: The goal of regression analysis is to define a relationship mathematically between the sample X’s and Y’s in order to understand how the variables interact.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: LOGISTIC REGRESSION

1. What is a nonlinear function that connects or links a dependent variable to the independent variables mathematically? 

  • Regression function
  • Link function (CORRECT)
  • Relationship function
  • Loss function

Correct: The link function connects, or links, a dependent variable to the independent variables mathematically. Data professionals use the link function to express the relationship between the X’s and the probability that Y equals some outcome.

2. What type of regression models a categorical variable based on one or more independent variables?

  • Logistic regression (CORRECT)
  • Ordinary regression
  • Coefficient regression
  • Linear regression

Correct: Logistic regression models a categorical variable based on one or more independent variables. The dependent variable can have two or more possible discrete values.

QUIZ: MODULE 1 CHALLENGE

1. Fill in the blank: Regression models are groups of _____ techniques that use data to estimate the relationships between a single dependent variable and one or more independent variables.

  • Application
  • exploratory data
  • coding
  • statistical (CORRECT)

2. Simple linear regression finds the _____ given a particular value of X.

  • mean of Y (CORRECT)
  • regression coefficients
  • Y intercept
  • median of Y

3. A data professional considers what data they have access to and how to view that data in a problem context. What PACE stage are they working in?

  • Plan (CORRECT)
  • Construct
  • Analyze
  • Execute

4. What technique estimates the relationship between a continuous dependent variable and one or more independent variables?

  • Linear regression (CORRECT)
  • Complex regression
  • Logistic regression
  • Ethical regression

5. Which of the following statements accurately describe dependent and independent variables? Select all that apply.

  • A dependent variable is often represented by X.
  • An independent variable is the variable a given model estimates.
  • A dependent variable is the variable a given model estimates. (CORRECT)
  • An independent variable is often represented by X. (CORRECT)

6. What describes a relationship in which one variable directly leads another to change in a particular way?

  • Intercept
  • Correlation
  • Causation (CORRECT)
  • Slope

7. A data professional reviews existing samples of data for both the dependent and independent variables. What is the term for this data sample?

  • Observed values (CORRECT)
  • Link functions
  • Parameters
  • Intercepts

8. A veterinary practice wants to determine whether most new patients will choose to return for follow-up care. A data analyst for the practice investigates this issue by modeling a categorical variable based on one or more independent variables. What technique do they use?

  • Logistic regression (CORRECT)
  • Coefficient regression
  • Linear regression
  • Slope regression

9. A data professional wants to connect the dependent variable and independent variable mathematically. What function can enable them to make this connection?

  • Coefficient function
  • Link function (CORRECT)
  • Coefficient regression
  • Link regression

10. What group of statistical techniques uses data to estimate the relationships between a single dependent variable and one or more independent variables?

  • Regression analysis (CORRECT)
  • Estimation coefficients
  • Regression coefficients
  • Estimation analysis

11. Simple linear regression finds the mean of Y _____.

  • for every observation
  • given a particular value of X (CORRECT)
  • to predict a probability
  • as X approaches zero

12. A data professional creates a model in Python and rechecks the model assumptions. What PACE stage are they working in?

  • Plan
  • Construct (CORRECT)
  • Analyze
  • Execute

13. Fill in the blank: _____ is a technique that estimates the relationship between a continuous dependent variable and one or more independent variables.

  • Logistic regression
  • Linear regression (CORRECT)
  • Complex regression
  • Ethical regression

14. What is an inverse relationship between two variables, where one variable increases, the other variable tends to decrease?

  • Positive correlation
  • Negative causation
  • Negative correlation (CORRECT)
  • Positive causation

15. A data professional creates a linear regression equation and reviews the properties of populations, sometimes referred to as Mu of y and the betas. What term describes this portion of the equation?

  • Lines
  • Intercepts
  • Parameters (CORRECT)
  • Slopes

16. A roadside assistance company wants to identify the probability of its customers renewing their annual membership. The analytics team looks into this topic by modeling a categorical variable based on one or more independent variables. What technique do they use?

  • Linear regression
  • Coefficient regression
  • Slope regression
  • Logistic regression (CORRECT)

17. What is a nonlinear function that connects the dependent variable to the independent variables mathematically?

  • Link regression
  • Coefficient regression
  • Link function (CORRECT)
  • Coefficient function

18. How many dependent variables typically exist in a regression model?

  • Four
  • Two
  • One (CORRECT)
  • Three

19. A data professional closely examines their data to choose a model that is appropriate to the problem they want to solve. What PACE stage are they working in?

  • Execute
  • Construct
  • Plan
  • Analyze (CORRECT)

20. A data professional reviews the estimated betas, often designated with a hat symbol. What is the term for this estimated beta?

  • Slope coefficients
  • Regression coefficients (CORRECT)
  • Regression intercepts
  • Parameter intercepts

21. Fill in the blank: A _____ connects the dependent variable to the independent variables mathematically.

  • Link function (CORRECT)
  • Coefficient function
  • Coefficient regression
  • Link regression

22. A data professional is estimating the relationship between a continuous dependent variable and one or more independent variables. What technique are they using?

  • Linear regression (CORRECT)
  • Complex regression
  • Logistic regression
  • Ethical regression

23. What is a relationship between two variables that tend to increase or decrease together?

  • Positive causation
  • Negative correlation
  • Positive correlation (CORRECT)
  • Negative causation

24. Which of the following statements accurately describe dependent and independent variables? Select all that apply.

  • Independent variables tend to vary based on the values of dependent variables.
  • Independent variables are typically represented by Y.
  • Dependent variables tend to vary based on the values of independent variables. (CORRECT)
  • Dependent variables are typically represented by Y. (CORRECT)

25. A sporting equipment manufacturer wants to know the likelihood of its customers choosing to reorder a particular item. The data team researches this question by modeling a categorical variable based on one or more independent variables. What technique do they use?

  • Coefficient regression
  • Linear regression
  • Logistic regression (CORRECT)
  • Slope regression

26. _____ finds the mean of Y given a particular value of X.

  • β
  • Logistic regression
  • Simple linear regression (CORRECT)
  • Function integration

27. Which of the following statements accurately describe dependent and independent variables? Select all that apply.

  • A dependent variable is also called the explanatory or predictor variable.
  • An independent variable is also called the response or outcome variable.
  • An independent variable is typically represented by X. (CORRECT)
  • A dependent variable is typically represented by Y. (CORRECT)

28. What are model assumptions?

  • The processes associated with converting model statistics into statements describing the relationships between the variables in the data
  • Ways to measure how well a model fits the data
  • The processes associated with building a model
  • Statements about the data that must be true to justify the use of particular data science techniques (CORRECT)

Correct: Model assumptions are statements about the data that must be true to justify the use of particular data science techniques. Data professionals use model assumptions to add validity to their conclusions. If model assumptions are true, then they can have more confidence in the results of a model.

29. It is often not possible to calculate the true values of parameters.

  • True (CORRECT)
  • False

Correct: Parameters are properties of populations and not samples, so it is often impossible to calculate their true value because it is usually the case that the whole population cannot be observed. In these cases, it is possible to calculate estimates of parameters using sample data.  

30. What technique models a categorical variable based on one or more independent variables?

  • Loss function
  • Link function
  • Regression coefficients
  • Logistic regression (CORRECT)

Correct: Logistic regression models a categorical variable based on one or more independent variables. The dependent variable in a logistic regression can have two or more possible discrete values.

CONCLUSION to Introduction to Complex Data Relationships

In conclusion, this section has equipped participants with a comprehensive understanding of regression modeling, unraveling the intricacies of assumptions, interpretations, and the application of both linear and logistic regression techniques. By navigating through the essential steps of building and analyzing regression models, participants have gained valuable insights into the practical implementation of these statistical tools. Armed with the knowledge and skills acquired in this section, learners are well-prepared to approach diverse business problems with confidence, leveraging regression models as powerful tools for data-driven decision-making.