COURSE 1: CREATE MACHINE LEARNING MODELS IN MICROSOFT AZURE

Module 2: Train And Evaluate Classification And Clustering Models

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Enroll in Coursera Microsoft Azure Data Scientist Associate (DP-100)

Last updated:

July 9, 2024

INTRODUCTION – Train And Evaluate Classification And Clustering Models

Classification is a type of machine learning used to sort items into different categories. In this module, you will learn how to build a machine learning model that predicts these categories using classification techniques. The scikit-learn framework in Python will be utilized to train and evaluate your classification model. Additionally, you will explore clustering, an unsupervised machine learning method that groups data observations into clusters. You will use scikit-learn in Python to train a clustering model.

Learning Objectives

When to use classification
How to train and evaluate a classification model using the Scikit-Learn framework
When to use clustering
How to train and evaluate a clustering model using the scikit-learn framework

PRACTICE QUIZ: KNOWLEDGE CHECK 1

1. True or False? Classification is an unsupervised machine learning technique.

True
False (CORRECT)

Correct: Classification is a form of supervised machine learning in which you train a model to use the features to predict a label.

2. Complete the statement:

Classification is a form of machine learning in which you train a model to predict […] of an item.

The category (CORRECT)
The feature
The numeric value

Correct: Classification uses known features and labels to predict the category an item belongs to.

3. Which Python package contains the train_test_split function?

Scikit-learn (CORRECT)
Matplotlib
Tensorflow
Numpy

Correct: In Python, the scikit-learn package contains a large number of functions we can use to build a machine learning model – including a train_test_split function that ensures we get a statistically random split of training and test data.

4. How would you split your data for training and testing to ensure the model performs well?

30% training and 70% testing
50% training and 50% testing
70% training and 30% testing (CORRECT)

Correct: This split would be the most optimum distribution of data that will allow the model to have sufficient data for training and testing.

5. For machine learning algorithms, how are parameters generally referred to as?

Staticparameters
Hyperparameters (CORRECT)
Superparameters

Correct: Parameters for machine learning algorithms are generally referred to as hyperparameters (to a data scientist, parameters are values in the data itself – hyperparameters are defined externally from the data!).

PRACTICE QUIZ: KNOWLEDGE CHECK 2

1. True or False?

Clustering is an example of a supervised machine learning technique.

True
False (CORRECT)

Correct: Clustering is an ‘unsupervised’ method, where ‘training’ is done without labels. Instead, models identify examples that have a similar collection of features.

2. When working on a clustering model, if you want to measure how tightly the data points are grouped, what metric should you use?

R-squared (R2)
Receiver operating characteristic (ROC) curve
F1 score
Within cluster sum of squares (WCSS) (CORRECT)

Correct: A metric often used to measure this tightness is the within cluster sum of squares (WCSS), with lower values meaning that the data points are closer.

3. This clustering algorithm separates a dataset into clusters of equal variances, where the number of clusters is user-defined.

Which clustering algorithm is this?

Logistic regression
K-means (CORRECT)
Hierarchical

Correct: This is a commonly used clustering algorithm that separates a dataset into K clusters of equal variances. The number of clusters, K, is user defined.

4. Select the correct steps that a basic K-means clustering algorithm consists of:

A set of K centroids are specifically chosen.
Clusters are formed by assigning the data points to a random centroid.
The means of each cluster is computed and the centroid is moved to the mean. (CORRECT)
When the clusters stop changing, the algorithm has converged. (CORRECT)
Steps 2 and 3 (b & c here) are repeated until a stopping criteria is met. (CORRECT)

Correct: This is correct.

Correct: This is correct. When the clusters stop changing, the locations of the clusters are defined. note that the random starting point for the centroids means that re-running the algorithm could result in slightly different clusters, so training usually involves multiple iterations, reinitializing the centroids each time, and the model with the best WCSS is selected.

Correct: Feedback: Steps 2 and 3 in their correct form are repeated until the stopping criteria is met. Typically, the algorithm terminates when each new iteration results in negligible movement of centroids and the clusters become static.

5. Which of the following algorithms are considered to be clustering-type algorithms?

Decision Tree
K-Means (CORRECT)
Hierarchical (CORRECT)

Correct: K-Means is a clustering-type algorithm.

Correct: Hierarchical clustering is a clustering-type algorithm.

QUIZ: TEST PREP

1. Which type of machine learning model can be trained using the Support Vector Machine algorithm?

Classification (CORRECT)
Clustering
Regression

Correct: Logistic Regression is a well-established algorithm for classification.

2. When using the classification report from sklearn.metrics to evaluate the performance of your model, what does the F1-Score metric provide?

An average metric that takes both precision and recall into account. (CORRECT)
Out of all of the instances of this class in the test dataset, how many did the model identify.
How many instances of this class are there in the test dataset.
Of the predictions the model made for this class, what proportion were correct.

Correct: This is what the F1-Score provides.

3. The Precision and Recall metrics are derived from four possible prediction outcomes.

If the predicted label is 1, but the actual label is 0, what would the outcome be?

False Negative
True Negative
False Positive (CORRECT)
True Positive

Correct: This outcome happens when the predicted label is 1, but the actual label is 0.

4. In multiclass classification, what are the two ways in which you can approach a problem?

Rest minus One
One and Rest
One vs One (CORRECT)
One vs Rest (CORRECT)

Correct: One vs One (OVO), in which a classifier for each possible pair of classes is created.

Correct: One vs Rest (OVR), in which a classifier is created for each possible class value, with a positive outcome for cases where the prediction is this class, and negative predictions for cases where the prediction is any other class.

5. Hierarchical clustering creates clusters using two methods.

Which are those two methods?

Aggregational
Distinctive
Agglomerative (CORRECT)
Divisive (CORRECT)

Correct: Agglomerative clustering is a “bottom up” approach.

Correct: The divisive method is a “top down” approach starting with the entire dataset and then finding partitions in a stepwise manner.

6. To which kind of machine learning can the K-Means clustering algorithm be associated with?

Reinforcement learning
Unsupervised machine learning (CORRECT)
Supervised machine learning

Correct: Clustering is a form of unsupervised machine learning in which the training data does not include known labels.

7. You are using scikit-learn library to train a K-Means clustering model that groups observations into four clusters. How should you create the K-Means object?

model = Kmeans(max_iter=4)
model = Kmeans(n_init=4)
model = KMeans(n_clusters=4) (CORRECT)

Correct: The n_clusters parameter determines the number of clusters.

CONCLUSION – Train And Evaluate Classification And Clustering Models

In conclusion, understanding classification and clustering techniques is essential for tackling a wide range of machine learning problems. This module provides you with the knowledge to create and evaluate classification models to predict categories and clustering models to group data observations. Using the scikit-learn framework in Python, you will develop practical skills in both supervised and unsupervised machine learning, laying a solid foundation for your future endeavors in data science.

Previous Module

Next Module

Quiztudy Top Courses

Popular in Coursera

Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!