COURSE 6: THE NUTS AND BOLTS OF MACHINE LEARNING

Module 4: Tree-Based Modeling

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Tree-Based Modeling

Throughout this section, the focus will be on supervised learning, a pivotal aspect of machine learning. Participants will delve into the intricacies of testing and validating the performance of various supervised machine learning models, including decision trees, random forests, and gradient boosting.

The comprehensive exploration of these models equips learners with the skills to understand, implement, and evaluate their effectiveness in solving real-world problems. By the end of this segment, participants will have a robust understanding of supervised learning methodologies, enabling them to make informed decisions and leverage these powerful tools in practical applications.

Learning Objectives

  • Identify tuning model parameters and how they affect performance and evaluation metrics
  • Distinguish boosting in ML, specifically for XGBoost models
  • Characterize bagging in ML, specifically for random forest models
  • Explore decision tree models, how they work, and their advantages over other types of supervised ML
  • Determine the different types of supervised learning models

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ADDITIONAL SUPERVISED LEARNING TECHNIQUES

1. Tree-based learning is a type of unsupervised machine learning that performs classification and regression tasks.

  • True
  • False (CORRECT)

Correct: Tree-based learning is a type of supervised machine learning. It is supervised because it uses labeled datasets to train algorithms to classify or predict outcomes. Unsupervised machine learning uses algorithms to analyze unlabeled data and find underlying structures.

2. Fill in the blank: Similar to a flow chart, a _____ is a classification model that represents various solutions available to solve a given problem based on the possible outcomes of each solution.

  • linear regression
  • decision tree (CORRECT)
  • Poisson distribution
  • binary logistic regression

Correct: A decision tree is a classification model that represents various solutions available to solve a given problem based on the possible outcomes of each solution. Decision trees enable data professionals to make predictions about future events based on currently available information. Binary logistic regression models the probability of an event that has two possible outcomes.

3. In a decision tree, which node is the location where the first decision is made?

  • Leaf
  • Branch
  • Decision
  • Root (CORRECT)

Correct: In a decision tree, the root node is where the first decision is made. It is the first node in the tree, and all decisions needed to make the prediction stem from it.

4. In tree-based learning, how is a split determined?

  • By which variables and cut-off values offer the most predictive power (CORRECT)
  • By the number of decisions required before arriving at a final prediction
  • By the amount of leaves present
  • By the level of balance present among the predictions made by the model

Correct: In tree-based learning, a split is determined by which variables and cut-off values offer the most predictive power. When this happens repeatedly, groups of data mostly or entirely of the same class are what is left.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: TUNE TREE-BASED MODELS

1. Fill in the blank: The hyperparameter max depth is used to limit the depth of a decision tree, which is the number of levels between the _____ and the farthest node away from it.

  • leaf node
  • root node (CORRECT)
  • first split
  • decision node

Correct: The hyperparameter max depth is used to limit the depth of a decision tree, which is the number of levels between the root node and the farthest node away from it. Hyperparameters are parameters that can be set before a model is trained. They can be tuned to improve performance, directly affecting how the model is fit to the data.

2. What tuning technique can a data professional use to confirm that a model achieves its intended purpose?

  • Min samples leaf
  • Classifier
  • Grid search (CORRECT)
  • Decision tree

Correct: Grid search is a tool that confirms that a model achieves its intended purpose. It does this by systematically checking every combination of hyperparameters to identify which set produces the best results, based on the selected metric.

3. During model validation, the validation dataset must be combined with test data in order to function properly.

  • True
  • False (CORRECT)

Correct: During model validation, the validation dataset must remain unseen until the very end of the process. It is a sample of data that is held back during training and is separate from test data.

4. Fill in the blank: Cross validation involves splitting training data into different combinations of _____, on which the model is trained.

  • Parcels
  • banks
  • tiers
  • folds (CORRECT)

Correct: Cross validation involves splitting training data into different combinations of folds, on which the model is trained. The process uses these different portions of the data to test and train a model across several iterations.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: BAGGING

1. Ensemble learning is most effective when the outputs are aggregated from models that follow the exact same methodology all using the same dataset.

  • True
  • False (CORRECT)

Correct: Ensemble learning is most effective when the outputs are aggregated from models that follow different methodologies—for instance, a logistic regression, a Naive Bayes model, and a decision tree classifier. In this way, any errors will be uncorrelated.

2. What are some of the benefits of ensemble learning? Select all that apply.

  • It requires few base learners trained on the same dataset.
  • The predictions have less bias than other standalone models. (CORRECT)
  • It combines the results of many models to help make more reliable predictions. (CORRECT)
  • The predictions have lower variance than other standalone models. (CORRECT)

Correct: Ensemble learning combines the results of many models to help make more reliable predictions. Also, these predictions have less bias and lower variance than other standalone models. In order to work, ensemble learning requires numerous base learners, all trained on a random subset of the training data.

3. In a random forest, what type of data is used to train the ensemble of decision-tree base learners?

  • Sampled
  • Unstructured
  • Bootstrapped (CORRECT)
  • Duplicated

Correct: In a random forest, bootstrapped data is used to train the ensemble of decision-tree base learners. Bootstrapping refers to sampling with replacement. So, a random forest model will grow each of its trees by taking a random subset of the available features in the training data, then splitting each node at the best feature available to that tree.

4. Fill in the blank: When using a decision tree model, a data professional can use _____ to control the threshold below which nodes become leaves.

  • min_samples_split (CORRECT)
  • max_features
  • min_samples_leaf
  • max_depth

Correct: When using a decision tree model, a data professional can use min_samples_split to control the threshold below which nodes become leaves.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: BOOSTING

1. When using the hyperparameter min_child_weight, a tree will not split a node if it results in any child node with less weight than what is specified. What happens to the node instead?

  • It gets deleted.
  • It becomes a root.
  • It becomes a leaf (CORRECT)
  • It duplicates itself to become another node.

Correct: When using the hyperparameter min child weight, a tree will not split a node if it results in any child node with less weight than what is specified. Instead, the node becomes a leaf.

2. Fill in the blank: The supervised learning technique boosting builds an ensemble of weak learners _____, then aggregates their predictions.

  • Repeatedly
  • in parallel
  • sequentially (CORRECT)
  • randomly

Correct: The supervised learning technique boosting builds an ensemble of weak learners sequentially, then aggregates their predictions.

3. When using a gradient boosting machine (GBM) modeling technique, which term describes a model’s ability to predict new values that fall outside of the range of values in the training data?

  • Grid search
  • Learning rate
  • Extrapolation (CORRECT)
  • Cross validation

Correct: When using a gradient boosting machine (GBM) modeling technique, extrapolation describes a model’s ability to predict new values that fall outside of the range of values in the training data.

QUIZ: MODULE 4 CHALLENGE

1. A junior data analyst uses tree-based learning for a sales and marketing project. Currently, they are interested in the section of the tree that represents where the first decision is made. What are they examining?

  • Branches
  • Leaves
  • Roots (CORRECT)
  • Splits

2. What are some disadvantages of decision trees? Select all that apply.

  • Preparing data to train a decision is a complex process involving significant preprocessing
  • Decision trees require assumptions regarding the distribution of underlying data.
  • Decision trees can be particularly susceptible to overfitting. (CORRECT)
  • When new data is introduced, decision trees can be less effective at prediction. (CORRECT)

3. Which section of a decision tree is where the final prediction is made?

  • Decision node
  • Split
  • Leaf node (CORRECT)
  • Root node

4. In a decision tree ensemble model, which hyperparameter controls how many decision trees the model will build for its ensemble?

  • max_features
  • max_depth
  • n_trees
  • n_estimators (CORRECT)

5. What process uses different “folds” (portions) of the data to train and evaluate a model across several iterations?

  • Grid search
  • Model validation
  • Cross validation (CORRECT)
  • Proportional verification

6. Which of the following statements correctly describe ensemble learning? Select all that apply.

  • When building an ensemble using different types of models, each should be trained on completely different data.
  • Predictions using an ensemble of models can be accurate even when the individual models are barely more accurate than a random guess. (CORRECT)
  • Ensemble learning involves aggregating the outputs of multiple models to make a final prediction. (CORRECT)
  • If a base learner’s prediction is only slightly better than a random guess, it is called a “weak learner.” (CORRECT)

7. Fill in the blank: A random forest is an ensemble of decision-tree _____ that are trained on bootstrapped data.

  • Statements
  • Observations
  • base learners (CORRECT)
  • variables

8. What are some benefits of boosting? Select all that apply.

  • Boosting is the most interpretable model methodology.
  • Boosting is a powerful predictive methodology. (CORRECT)
  • Boosting can handle both numeric and categorical features. (CORRECT)
  • Boosting does not require the data to be scaled. (CORRECT)

9. Which of the following statements correctly describe gradient boosting? Select all that apply.

  • Gradient boosting machines cannot perform classification tasks.
  • Gradient boosting machines have many hyperparameters. (CORRECT)
  • Gradient boosting machines do not give coefficients or directionality for their individual features. (CORRECT)
  • Gradient boosting machines are often called black-box models because their predictions can be difficult to explain. (CORRECT)

10. A data professional uses tree-based learning for an operations project. Currently, they are interested in the nodes at which the trees split. What type of nodes do they examine?

  • Decision (CORRECT)
  • Branch
  • Leaf
  • Root

11. What are some benefits of decision trees? Select all that apply.

  • When working with decision trees, overfitting is unlikely.
  • When preparing data to train a decision tree, very little preprocessing is required. (CORRECT)
  • Decision trees enable data professionals to make predictions about future events based on currently available information. (CORRECT)
  • Decision trees require no assumptions regarding the distribution of underlying data. (CORRECT)

12. In a decision tree, what type(s) of nodes can decision nodes point to? Select all that apply.

  • Split
  • Root node
  • Leaf node (CORRECT)
  • Decision node (CORRECT)

13. In a decision tree model, which hyperparameter sets the threshold below which nodes become leaves?

  • Min child weight
  • Min samples tree
  • Min samples split (CORRECT)
  • Min samples leaf

14. When might you use a separate validation dataset? Select all that apply.

  • If you have very little data.
  • If you want to choose the specific samples used to validate the model. (CORRECT)
  • If you have a very large amount of data. (CORRECT)
  • If you want to compare different model scores to choose a champion before predicting on test holdout data. (CORRECT)

15. What tool is used to confirm that a model achieves its intended purpose by systematically checking combinations of hyperparameters to identify which set produces the best results, based on the selected metric?

  • GridSearchCV (CORRECT)
  • Model validation
  • Cross validation
  • Hyperparameter verification

16. Which of the following statements correctly describe ensemble learning? Select all that apply.

  • If a base learner’s prediction is equally effective as a random guess, it is a strong learner.
  • It’s possible to use the same methodology for each contributing model, as long as there are numerous base learners. (CORRECT)
  • Ensemble learning involves building multiple models. (CORRECT)
  • It’s possible to use very different methodologies for each contributing model. (CORRECT)

17. Which of the following statements correctly describe gradient boosting? Select all that apply.

  •  Gradient boosting machines build models in parallel.
  • Gradient boosting machines tell you the coefficients for each feature.
  • Gradient boosting machines work well with missing data. (CORRECT)
  • Gradient boosting machines do not require the data to be scaled. (CORRECT)

18. Which of the following statements accurately describe decision trees? Select all that apply.

  • Decision trees are equally effective at predicting both existing and new data.
  • Decision trees work by sorting data. (CORRECT)
  • Decision trees require no assumptions regarding the distribution of underlying data. (CORRECT)
  • Decision trees are susceptible to overfitting. (CORRECT)

19. What is the only section of a decision tree that contains no predecessors?

  • Leaf node
  • Root node (CORRECT)
  • Decision node
  • Split based on what will provide the most predictive power.

20. In a decision tree, nodes are where decisions are made, and they are connected by edges.

  • True (CORRECT)
  • False

Correct: In a decision tree, nodes are where decisions are made, and they are connected by edges. At each node, a single feature of the data is considered and decided on. Edges direct from one node to the next during this process. Eventually, all relevant features will have been resolved, resulting in the classification prediction.

21. Fill in the blank: Each base learner in a random forest model has different combinations of features available to it, which helps prevent correlated errors among _____ in the ensemble.

  • Nodes
  • roots
  • learners (CORRECT)
  • splits

22. What are some benefits of boosting? Select all that apply.

  • The models used in boosting can be trained in parallel across many different servers.
  • Boosting reduces bias. (CORRECT)
  • Because no single tree weighs too heavily in the ensemble, boosting reduces the problem of high variance. (CORRECT)
  • Boosting can improve model accuracy. (CORRECT)

23. Which of the following statements correctly describe gradient boosting? Select all that apply.

  • Gradient boosting models can be trained in parallel.
  • Each base learner in the sequence is built to predict the residual errors of the model that preceded it. (CORRECT)
  • Gradient boosting machines can be difficult to interpret. (CORRECT)
  • Gradient boosting machines have difficulty with extrapolation. (CORRECT)

24. A data analytics team uses tree-based learning for a research and development project. Currently, they are interested in the parts of the decision tree that represent an item’s target value. What are they examining?

  • Roots
  • Branches
  • Leaves (CORRECT)
  • Splits

25. In a decision tree model, which hyperparameter specifies the number of attributes that each tree selects randomly from the training data to determine its splits?

  • Learning rate
  • Max features (CORRECT)
  • Number of estimators
  • Max depth

26. Adaboost is a tree-based boosting methodology in which each consecutive base learner assigns greater weight to the observations that were correctly predicted by the preceding learner.

  • True
  • False (CORRECT)

Correct: Adaboost is a tree-based boosting methodology in which each consecutive base learner assigns greater weight to the observations that were incorrectly predicted by the preceding learner. It builds its first tree on training data that gives equal weight to each observation. Then, it evaluates which observations were incorrectly predicted by the first tree, increasing the weights for those while decreasing the weights for correct observations. The process repeats until a tree makes a perfect prediction or the ensemble reaches the maximum number of trees.

27. Why might a GBM, or gradient-boosting machine, be inappropriate for use in the health care or financial fields?

  • Its predictions cannot be precisely explained. (CORRECT)
  • It doesn’t perform well with missing data.
  • It requires the data to be scaled.
  • It is inaccurate.

Correct: A GBM may be inappropriate for use in the health care or financial fields because its predictions cannot be precisely explained. These are often called black-box models.

CONCLUSION – Tree-Based Modeling

In conclusion, this section has provided an in-depth exploration of supervised learning within the realm of machine learning. Participants have gained valuable insights into testing and validating the performance of key models, including decision trees, random forests, and gradient boosting.

The acquired knowledge empowers learners to navigate and apply these supervised learning techniques effectively, fostering their ability to address complex challenges and make data-driven decisions. This comprehensive overview sets the stage for participants to confidently integrate supervised learning methodologies into their repertoire, contributing to their proficiency in the dynamic field of machine learning.