COURSE 1: CREATE MACHINE LEARNING MODELS IN MICROSOFT AZURE
Module 2: Train And Evaluate Classification And Clustering Models
MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE
Complete Coursera Study Guide
Last updated:
INTRODUCTION – Train And Evaluate Classification And Clustering Models
Classification is a type of machine learning used to sort items into different categories. In this module, you will learn how to build a machine learning model that predicts these categories using classification techniques. The scikit-learn framework in Python will be utilized to train and evaluate your classification model. Additionally, you will explore clustering, an unsupervised machine learning method that groups data observations into clusters. You will use scikit-learn in Python to train a clustering model.
Learning Objectives
- When to use classification
- How to train and evaluate a classification model using the Scikit-Learn framework
- When to use clustering
- How to train and evaluate a clustering model using the scikit-learn framework
PRACTICE QUIZ: KNOWLEDGE CHECK 1
1. True or False? Classification is an unsupervised machine learning technique.
- True
- False (CORRECT)
Correct: Classification is a form of supervised machine learning in which you train a model to use the features to predict a label.
2. Complete the statement:
Classification is a form of machine learning in which you train a model to predict […] of an item.
- The category (CORRECT)
- The feature
- The numeric value
Correct: Classification uses known features and labels to predict the category an item belongs to.
3. Which Python package contains the train_test_split function?
- Scikit-learn (CORRECT)
- Matplotlib
- Tensorflow
- Numpy
Correct: In Python, the scikit-learn package contains a large number of functions we can use to build a machine learning model – including a train_test_split function that ensures we get a statistically random split of training and test data.
4. How would you split your data for training and testing to ensure the model performs well?
- 30% training and 70% testing
- 50% training and 50% testing
- 70% training and 30% testing (CORRECT)
Correct: This split would be the most optimum distribution of data that will allow the model to have sufficient data for training and testing.
5. For machine learning algorithms, how are parameters generally referred to as?
- Staticparameters
- Hyperparameters (CORRECT)
- Superparameters
Correct: Parameters for machine learning algorithms are generally referred to as hyperparameters (to a data scientist, parameters are values in the data itself – hyperparameters are defined externally from the data!).
PRACTICE QUIZ: KNOWLEDGE CHECK 2
1. True or False?
Clustering is an example of a supervised machine learning technique.
- True
- False (CORRECT)
Correct: Clustering is an ‘unsupervised’ method, where ‘training’ is done without labels. Instead, models identify examples that have a similar collection of features.
2. When working on a clustering model, if you want to measure how tightly the data points are grouped, what metric should you use?
- R-squared (R2)
- Receiver operating characteristic (ROC) curve
- F1 score
- Within cluster sum of squares (WCSS) (CORRECT)
Correct: A metric often used to measure this tightness is the within cluster sum of squares (WCSS), with lower values meaning that the data points are closer.
3. This clustering algorithm separates a dataset into clusters of equal variances, where the number of clusters is user-defined.
Which clustering algorithm is this?
- Logistic regression
- K-means (CORRECT)
- Hierarchical
Correct: This is a commonly used clustering algorithm that separates a dataset into K clusters of equal variances. The number of clusters, K, is user defined.
4. Select the correct steps that a basic K-means clustering algorithm consists of:
- A set of K centroids are specifically chosen.
- Clusters are formed by assigning the data points to a random centroid.
- The means of each cluster is computed and the centroid is moved to the mean. (CORRECT)
- When the clusters stop changing, the algorithm has converged. (CORRECT)
- Steps 2 and 3 (b & c here) are repeated until a stopping criteria is met. (CORRECT)
Correct: This is correct.
Correct: This is correct. When the clusters stop changing, the locations of the clusters are defined. note that the random starting point for the centroids means that re-running the algorithm could result in slightly different clusters, so training usually involves multiple iterations, reinitializing the centroids each time, and the model with the best WCSS is selected.
Correct: Feedback: Steps 2 and 3 in their correct form are repeated until the stopping criteria is met. Typically, the algorithm terminates when each new iteration results in negligible movement of centroids and the clusters become static.
5. Which of the following algorithms are considered to be clustering-type algorithms?
- Decision Tree
- K-Means (CORRECT)
- Hierarchical (CORRECT)
Correct: K-Means is a clustering-type algorithm.
Correct: Hierarchical clustering is a clustering-type algorithm.
QUIZ: TEST PREP
1. Which type of machine learning model can be trained using the Support Vector Machine algorithm?
- Classification (CORRECT)
- Clustering
- Regression
Correct: Logistic Regression is a well-established algorithm for classification.
2. When using the classification report from sklearn.metrics to evaluate the performance of your model, what does the F1-Score metric provide?
- An average metric that takes both precision and recall into account. (CORRECT)
- Out of all of the instances of this class in the test dataset, how many did the model identify.
- How many instances of this class are there in the test dataset.
- Of the predictions the model made for this class, what proportion were correct.
Correct: This is what the F1-Score provides.
3. The Precision and Recall metrics are derived from four possible prediction outcomes.
If the predicted label is 1, but the actual label is 0, what would the outcome be?
- False Negative
- True Negative
- False Positive (CORRECT)
- True Positive
Correct: This outcome happens when the predicted label is 1, but the actual label is 0.
4. In multiclass classification, what are the two ways in which you can approach a problem?
- Rest minus One
- One and Rest
- One vs One (CORRECT)
- One vs Rest (CORRECT)
Correct: One vs One (OVO), in which a classifier for each possible pair of classes is created.
Correct: One vs Rest (OVR), in which a classifier is created for each possible class value, with a positive outcome for cases where the prediction is this class, and negative predictions for cases where the prediction is any other class.
5. Hierarchical clustering creates clusters using two methods.
Which are those two methods?
- Aggregational
- Distinctive
- Agglomerative (CORRECT)
- Divisive (CORRECT)
Correct: Agglomerative clustering is a “bottom up” approach.
Correct: The divisive method is a “top down” approach starting with the entire dataset and then finding partitions in a stepwise manner.
6. To which kind of machine learning can the K-Means clustering algorithm be associated with?
- Reinforcement learning
- Unsupervised machine learning (CORRECT)
- Supervised machine learning
Correct: Clustering is a form of unsupervised machine learning in which the training data does not include known labels.
7. You are using scikit-learn library to train a K-Means clustering model that groups observations into four clusters. How should you create the K-Means object?
- model = Kmeans(max_iter=4)
- model = Kmeans(n_init=4)
- model = KMeans(n_clusters=4) (CORRECT)
Correct: The n_clusters parameter determines the number of clusters.
CONCLUSION – Train And Evaluate Classification And Clustering Models
In conclusion, understanding classification and clustering techniques is essential for tackling a wide range of machine learning problems. This module provides you with the knowledge to create and evaluate classification models to predict categories and clustering models to group data observations. Using the scikit-learn framework in Python, you will develop practical skills in both supervised and unsupervised machine learning, laying a solid foundation for your future endeavors in data science.
Quiztudy Top Courses
Popular in Coursera
- Google Advanced Data Analytics
- Google Cybersecurity Professional Certificate
- Meta Marketing Analytics Professional Certificate
- Google Digital Marketing & E-commerce Professional Certificate
- Google UX Design Professional Certificate
- Meta Social Media Marketing Professional Certificate
- Google Project Management Professional Certificate
- Meta Front-End Developer Professional Certificate
Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!

