COURSE 6: THE NUTS AND BOLTS OF MACHINE LEARNING

Module 2: Workflow for Building Complex Models

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Workflow for Building Complex Models

In this module, participants will dive into the structured workflow that data professionals follow when engaging in machine learning endeavors. The comprehensive overview will guide learners through the main steps of this workflow, highlighting the significance of each stage in the overall process of applying machine learning to real-world scenarios. Understanding the sequential nature of these steps is crucial for effective and efficient machine learning implementation.

The course will emphasize the practical application of various machine learning models to address specific business problems. By the end, participants will not only grasp the theoretical foundations of the structured workflow but will also gain hands-on experience in employing machine learning techniques to extract valuable insights and solutions from diverse business challenges. This holistic approach ensures that learners not only comprehend the conceptual framework of machine learning workflows but also acquire the practical skills needed to navigate and contribute to this dynamic field.

Learning Objectives

  • Identify and apply model validation techniques
  • Articulate how Naive Bayes models work and what they’re used for
  • Construct a Naive Bayes model
  • Apply feature engineering techniques using Python
  • Describe the way PACE informs each step of the data science end-to-end workflow for ML

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: PACE IN MACHINE LEARNING: THE PLAN AND ANALYZE STAGES

1. Fill in the blank: Feature engineering enables data professionals to take _____ and extract features from it.

  • raw data (CORRECT)
  • delimited text
  • a dynamic dashboard
  • a code chunk

Correct: Feature engineering enables data professionals to take raw data and extract certain attributes from it. It also enables them to select, extract, or transform characteristics and properties from raw data.

2. What term describes the process of modifying existing features in a way that improves accuracy when training a model?

  • Feature transformation (CORRECT)
  • Feature improvement
  • Feature extraction
  • Feature selection

Correct: Feature transformation describes the process of modifying existing features in a way that improves accuracy when training the model.

3. A class imbalance occurs when a dataset has a predictor variable that contains an equal number of instances of all possible outcomes.

  • True
  • False (CORRECT)

Correct: A class imbalance occurs when a dataset has a predictor variable that contains many more instances of one outcome than another.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: PACE IN MACHINE LEARNING: THE PLAN AND ANALYZE STAGES

1. Fill in the blank: Posterior probability is the probability of an event occurring after considering _____ information.

  • undefined
  • new (CORRECT)
  • historical
  • conditional

Correct: Posterior probability is the probability of an event occurring after taking into consideration new information.

2. A data professional would use the function MinMaxScaler to normalize the columns in a model so that each value falls between zero and one.

  • True (CORRECT)
  • False

Correct: A data professional would use the function MinMaxScaler to normalize the columns in a model so that each value falls between zero and one. The column’s minimum value scales to zero, and its maximum value scales to one. Everything else falls somewhere in between.

3. A data professional has built a model, and now they are adjusting how features are engineered in order to improve performance. Which PACE stage does this scenario describe?

  • Construct
  • Analyze
  • Execute (CORRECT)
  • Plan

Correct: This scenario describes the Execute stage. Building a model is an iterative process, so it’s essential to improve upon results by changing specific parameters or how features are engineered to continue improving performance.

QUIZ: MODULE 2 CHALLENGE

1. Which of the following statements accurately describe feature engineering? Select all that apply.

  • Feature transformation involves selecting the features in the data that contribute the most to predicting the response variable.
  • Feature engineering involves selecting, transforming, or extracting elements from within raw data. (CORRECT)
  • In feature engineering, a data professional may use their practical, statistical, and data science knowledge. (CORRECT)
  • Feature extraction involves taking multiple features to create a new one that will improve the accuracy of the algorithm. (CORRECT)

2. A data professional resolves a class imbalance in a very large dataset. They alter the majority class by using fewer of the original data points in order to produce a split that is more even. What does this scenario describe?

  • Upsampling
  • Merging
  • Downsampling (CORRECT)
  • Smoothing

3. Fill in the blank: Customer churn is the business term that describes how many customers stop _____ and at what rate this occurs.

  • researching a company’s offerings
  • using a product or service (CORRECT)
  • sharing feedback with a company
  • reviewing items online

4. Naive Bayes is a supervised classification technique that assumes independence among predictors. What is the meaning of this concept?

  • The value of a predictor variable on a given class is dependent upon the values of other predictors.
  • The value of a predictor variable on a given class is measured by the values of other predictors.
  • The value of a predictor variable on a given class is equal to the values of other predictors.
  • The value of a predictor variable on a given class is not affected by the values of other predictors. (CORRECT)

5. Fill in the blank: When using a scaler to _____ the columns in a dataset using MinMaxScaler, a data professional must fit the scaler to the training data and transform both the training data and the test data using that same scaler.

  • customize
  • filter
  • sort
  • normalize (CORRECT)

6. A data professional evaluates a model’s performance and considers how it can be improved. Which PACE stage does this scenario describe?

  • Analyze
  • Plan
  • Construct
  • Execute (CORRECT)

7. In the model-development process, which type of feature is useful by itself because it contains information that will be useful when forecasting the target?

  • Redundant
  • Irrelevant
  • Predictive (CORRECT)
  • Interactive

8. Fill in the blank: Log normalization is useful when working with a model that cannot manage continuous variables with _____ distributions.

  • Binomial
  • probability
  • normal
  • skewed (CORRECT)

9. A data professional discovers that the dataset they are working with contains a class imbalance. The majority class comprises 90% of the data and the minority class comprises 10% of the data. Which of the following statements best describe the impact of this class imbalance?

  • Major issues should not arise if the majority class makes up 10% or less of the dataset.
  • Major issues should not arise because the data has a 50-50 split of outcomes.
  • Major issues will arise if the data professional decides to rebalance the dataset.
  • Major issues will arise because the majority class makes up 90% or more of the dataset. (CORRECT)

10. Fill in the blank: Customer churn is a business term that describes how many customers stop _____ and at what rate this occurs.

  • writing positive reviews about a company
  • doing business with a company (CORRECT)
  • returning items to a company
  • contacting a company’s customer relations department

11. What does Bayes’s theorem enable data professionals to calculate?

  • Data accuracy
  • Posterior probability (CORRECT)
  • Causation
  • Margin of error

12. Fill in the blank: When normalizing the columns in a dataset using MinMaxScaler, the columns’ maximum value scales to one, and the minimum value scales to _____. Everything else falls somewhere in between.

  • .5
  • -1
  • 0.1
  • 0 (CORRECT)

13. In the model-development process, which type of feature is not useful by itself for predicting the target variable, but becomes predictive in conjunction with other features?

  • Predictive
  • Irrelevant
  • Redundant
  • Interactive (CORRECT)

14. Naive Bayes’s theorem enables data professionals to calculate posterior probability for a data project. What does posterior probability describe?

  • The likelihood of an event occurring after taking into consideration all new, relevant observations and information (CORRECT)
  • The likelihood of an event occurring after taking into consideration only the most suitable observations and information
  • The likelihood of an event occurring based upon only observations and information that align with current hypotheses
  • The likelihood of an event occurring based upon the observations and information that were available at the start of the data project

15. A data professional assesses a business need in order to determine what type of model is best suited to a project. Which PACE stage does this scenario describe?

  • Analyze
  • Construct
  • Execute
  • Plan (CORRECT)

16. Fill in the blank: Log normalization involves taking the log of a _____ feature and making the data more effective for modeling.

  • Skewed (CORRECT)
  • continuous
  • normal
  • probable

17. Fill in the blank: Log normalization involves reducing _____ in order to make and making the data more effective for modeling.

  • Probability
  • skew
  • continuity
  • normality (CORRECT)

18. In the model-development process, which type of feature does not contain any useful information for predicting the target variable?

  • Predictive
  • Irrelevant (CORRECT)
  • Conducive
  • Relevant

19. Which of the following statements accurately describe feature engineering? Select all that apply.

  • Feature engineering does not involve using a data professional’s statistical knowledge.
  • Feature engineering may involve transforming the properties of raw data. (CORRECT)
  • In feature engineering, feature selection involves choosing the features in the data that contribute the most to predicting the response variable. (CORRECT)
  • In feature engineering, feature extraction involves taking multiple features to create a new one that will improve the accuracy of the algorithm. (CORRECT)

20. Which of the following statements accurately describe the general categories of feature engineering? Select all that apply.

  • Feature selection involves taking multiple features to create a new one that will improve the accuracy of the algorithm.
  • Feature extraction involves choosing the features in the data that contribute the most to predicting the response variable.
  • Feature transformation involves modifying existing features in a way that improves accuracy when training a model. (CORRECT)
  • The three general categories of feature engineering are selection, extraction, and transformation. (CORRECT)

21. A data professional works with a dataset for a project with their company’s human resources team. They discover that the dataset has a predictor variable that contains more instances of one outcome than another. What will occur as a result of this scenario?

  • Class imbalance (CORRECT)
  • Inconsistent data
  • Incompatibility
  • Redundancy

22. A data professional examines a dataset to reveal key details about the data that will help inform the plans for building a model. Which PACE stage does this scenario describe?

  • Execute
  • Plan
  • Construct
  • Analyze (CORRECT)

23. Fill in the blank: When normalizing the columns in a dataset using MinMaxScaler, the columns’ maximum value scales to _____, and the minimum value scales to zero. Everything else falls somewhere in between.

  • 10
  • .5
  • 100
  • 1 (CORRECT)

24. Fill in the blank: Customer _____ is the business term that describes how many customers stop using a product or service, or stop doing business with a company altogether, and at what rate this occurs.

  • Churn (CORRECT)
  • exchange
  • retention
  • transfer

25. Fill in the blank: Naive Bayes is a supervised classification technique that is based on Bayes’ Theorem, with an assumption of _____ among predictors.

  • Interdependence
  • even distribution
  • clear hierarchy
  • independence (CORRECT)

Correct: Naive Bayes is a supervised classification technique that is based on Bayes’ Theorem, with an assumption of independence among predictors.

26. In classification techniques, what is the term for the proportion of actual positives that are identified correctly to all actual positives?

  • Accuracy
  • Precision
  • Recall (CORRECT)
  • F1 score

Correct: Recall is the proportion of actual positives that are identified correctly to all actual positives.

CONCLUSION – Workflow for Building Complex Models

In conclusion, this module provides a robust foundation for participants to navigate the intricacies of machine learning workflows. By following a structured approach, learners are equipped to tackle real-world business challenges effectively. The emphasis on practical application ensures that theoretical concepts are translated into actionable insights.

As participants progress through the course, they not only gain a theoretical understanding of the workflow but also develop the hands-on skills necessary to apply machine learning models to diverse business scenarios. This comprehensive learning experience positions participants to confidently contribute to the dynamic field of machine learning and make meaningful impacts in the realm of data science.