COURSE 6: THE NUTS AND BOLTS OF MACHINE LEARNING

Module 3: Unsupervised Learning Techniques

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Unsupervised Learning Techniques

In this expansive overview, participants will delve into the realm of unsupervised learning, a pivotal aspect of machine learning methodologies. The module begins by illuminating the distinctions between supervised and unsupervised techniques, elucidating the unique advantages and applications of each approach. Participants will gain a comprehensive understanding of the principles that underpin unsupervised learning, setting the stage for a more nuanced exploration of its practical applications.

The core focus of the module revolves around the application of two key unsupervised machine learning models: clustering and K-means. Participants will not only grasp the theoretical foundations of these models but will also engage in hands-on activities, allowing them to develop the skills needed to apply these techniques in real-world scenarios. Through a combination of theoretical insights and practical exercises, this module equips participants with the knowledge and proficiency to leverage unsupervised learning effectively, unlocking new possibilities for data analysis and pattern recognition.

Learning Objectives

  • Optimize the results of K-Means model
  • Evaluate K-means results
  • Code a K-means algorithm in Python
  • Articulate how unsupervised learning differs from supervised learning

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: EXPLORE UNSUPERVISED LEARNING AND K-MEANS

1. Fill in the blank: K-means is an unsupervised partitioning algorithm used to organize _____ data into clusters.

  • Unlabeled (CORRECT)
  • hierarchical
  • subcategorized
  • presorted

Correct: K-means is an unsupervised partitioning algorithm used to organize unlabeled data into clusters. It does this by creating a logical scheme to make sense of the data.

2. In k-means, what term describes the point at which each cluster is defined?

  • Commonality
  • Core (CORRECT)
  • Centroid
  • Coordinate

Correct: In k-means, the centroid is the point at which each cluster is defined. Its position represents the center of the cluster, also known as the mathematical mean.

3. A data professional is repeating certain tasks that will enable them to create a k-means model. They continue doing this until the algorithm converges. Which step of the model-building process does this scenario represent?

  • Step three
  • Step two
  • Step four (CORRECT)
  • Step one

Correct: Step four involves repeating steps two and three until the algorithm converges.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: EVALUATE A K-MEANS MODEL

1. In a k-means model, which evaluation metric represents the sum of the squared distances between each observation and its closest centroid?

  • SMAPE
  • F1-score
  • Silhouette score
  • Inertia (CORRECT)

Correct: Inertia represents the sum of the squared distances between each observation and its closest centroid. It is used to measure intracluster distances by gauging how closely related each observation is to the other observations within its own cluster.

2. Fill in the blank: A data professional may use the _____ method to choose an optimal value for k. This is a tool for identifying the point at which the decrease in inertia starts to level off.

  • Elbow (CORRECT)
  • clustering
  • unsupervised learning
  • partitioning

Correct: A data professional may use the elbow method to choose an optimal value for k. This is the point at which the decrease in inertia starts to level off.

3. A data professional is using Scikit-learn to create a k-means model. Which attribute will enable them to get the cluster assignments?

  • Fit
  • Inertia
  • Silhouette score
  • Labels (CORRECT)

Correct: The labels attribute will enable them to get the cluster assignments. It returns a list of values that is the same length as the training data. Each value corresponds to the number of the cluster to which that point is assigned.

QUIZ: MODULE 3 CHALLENGE

1. Which of the following statements correctly describe key aspects of k-means? Select all that apply.

  • The clustering process has four steps that repeat until the model disperses evenly.
  • Poor clustering is caused by local minima, which means there is not an appropriate distance between clusters. (CORRECT)
  • K-means groups unlabeled data into k clusters based on their similarities. (CORRECT)
  • K-means organizes data by creating a logical scheme to make sense of it. (CORRECT)

2. A data professional chooses the number of centroids to use in a k-means model and places them in the data space. Which step of the model-creation process is the data professional working in?

  • Step one (CORRECT)
  • Step two
  • Step three
  • Step four

3. Fill in the blank: In order to evaluate the intracluster space in a k-means model, a data professional uses the inertia metric. This is the _____ of the squared distances between each observation and its nearest centroid.

  • Ratio
  • difference
  • average
  • sum (CORRECT)

4. A data analyst creates a k-means model. They observe a silhouette score coefficient with a value of zero. What conclusion should they draw in this scenario?

  • The observation is on the boundary between clusters. (CORRECT)
  • The observation may be in the wrong cluster.
  • The observation is suitably within its own cluster and well separated from other clusters.
  • The observation is in an appropriate cluster.

5. Which Python function fits a k-means model for multiple values of k by calculating the inertia for each value, appending it to a list, and returning that list?

  • k-means inertia (CORRECT)
  • silhouette score
  • labels
  • cluster_image

6. Which of the following statements accurately describe the elbow method? Select all that apply.

  • With k-means models, the elbow method is used to find all similar values of k.
  • The model that will provide the most meaningful clustering of data has inertia that is dropping significantly with added clusters. (CORRECT)
  • The elbow method helps data professionals decide which clustering gives the most meaningful model. (CORRECT)
  • The elbow method uses a line plot to visually compare the inertias of different models. (CORRECT)

7. Which of the following statements correctly describe key aspects of k-means? Select all that apply.

  • The value of k is a standard that never changes.
  • K-means is an unsupervised partitioning algorithm. (CORRECT)
  • To avoid poor clustering, data professionals run a k-means model with different starting positions for the centroids. (CORRECT)
  • K-means clusters are defined by a central point, called a centroid. (CORRECT)

8. A junior data analyst building a K-means model recalculates the centroid of each cluster. Which step of the model-creation process are they working in?

  • Step one
  • Step two
  • Step three (CORRECT)
  • Step four

9. Which Python function would a data professional use to compare the inertias of multiple k values?

  • k-means inertia (CORRECT)
  • labels
  • silhouette score
  • cluster_image

10. Which of the following statements accurately describe the elbow method? Select all that apply.

  • When using the elbow method, data professionals aim to find the smoothest part of the curve.
  • The elbow method uses a line plot to visually compare the inertias of different models. (CORRECT)
  • There is not always an obvious elbow. (CORRECT)
  • The sharpest bend in the curve is usually the model that will provide the most meaningful clustering of data. (CORRECT)

11. A data analytics team building a k-means model assigns each data point to its nearest centroid. Which step of the model-creation process are they working in?

  • Step one
  • Step two (CORRECT)
  • Step three
  • Step four

12. Fill in the blank: In order to evaluate the _____ space in a k-means model, a data professional uses the inertia metric. This is the sum of the squared distances between each observation and its nearest centroid.

  • Intracluster (CORRECT)
  • midpoint
  • converged
  • intercluster

13. Which of the following statements correctly describe key aspects of k-means? Select all that apply.

  • K-means is a supervised partitioning algorithm.
  • K-means organizes unlabeled data into clusters. (CORRECT)
  • The position of the k-means centroid is the center of the cluster, also known as the mathematical mean. (CORRECT)
  • The k-means clustering process has four steps that repeat until the model converges. (CORRECT)

14. Fill in the blank: In order to evaluate the intracluster space in a k-means model, a data professional uses the _____ metric. This is the sum of the squared distances between each observation and its nearest centroid.

  • spread
  • inertia (CORRECT)
  • convergence
  • silhouette score

15. A junior data professional creates a k-means model. They observe a silhouette score coefficient with a value close to negative one.? What conclusion should they draw in this scenario?

  • The observation is in the correct cluster.
  • The observation is on the boundary between clusters.
  • The observation is suitably within its own cluster and well separated from other clusters.
  • The observation may be in the wrong cluster. (CORRECT)

16. When using k-means, the value of k is always the same, no matter how many clusters are necessary for a project.

  • True
  • False (CORRECT)

Correct: When using k-means, the value for k is a decision that the modeler makes. Sometimes the data professional will have an idea about the number of clusters necessary for a project. Other times, it will be necessary to try different values to determine which one provides the best results.

17. What are the characteristics of an effective clustering model? Select all that apply.

  • The clusters are overlapping.
  • The clusters are clearly identifiable. (CORRECT)
  • Within each intracluster, the points are close to each other. (CORRECT)
  • Within each intercluster, there is lots of empty space. (CORRECT)

Correct: For an effective clustering model, the clusters should be clearly identifiable. Within each intracluster, the points are close to each other; within each intercluster, there is lots of empty space.

18.Fill in the blank: Silhouette score is the _____ of the silhouette coefficients of all the observations in a model.

  • value
  • sum
  • range
  • mean (CORRECT)

Correct: Silhouette score is the mean of the silhouette coefficients of all the observations in a k-means model. This metric enables data professionals to evaluate a model, taking into account the separation between clusters.

CONCLUSION – Unsupervised Learning Techniques

In summary, these module offer a well-rounded education in data analytics and cybersecurity, covering fundamental concepts, practical tools, and real-world applications. Participants gain valuable skills in operating systems, programming languages, and advanced analytics, preparing them for diverse roles in technology-driven industries.

Emphasizing both theoretical understanding and hands-on experience, this comprehensive journey equips learners to address real-world challenges and excel in their respective fields, providing a solid foundation for future career opportunities.