COURSE 3: BUILD AND OPERATE MACHINE LEARNING SOLUTIONS WITH AZURE

Module 5: Select Models And Protect Sensitive Data

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Enroll in Coursera Microsoft Azure Data Scientist Associate (DP-100)

Last updated:

July 10, 2024

INTRODUCTION – Select Models And Protect Sensitive Data

In this module, you will learn how to leverage automated machine learning in Azure Machine Learning to identify the best model for your data. Additionally, you will explore differential privacy, a cutting-edge approach that allows for valuable analysis while safeguarding individually identifiable data values.

Furthermore, the module will cover the various factors that influence the predictions models make, providing you with a comprehensive understanding of how to optimize and protect your data-driven insights.

Learning Objectives

Use Azure Machine Learning’s automated machine learning capabilities to determine the best performing algorithm for your data.
Use automated machine learning to preprocess data for training.
Run an automated machine learning experiment.
Articulate the problem of data privacy.
Describe how differential privacy works.
Configure parameters for differential privacy.
Perform differentially private data analysis.
Interpret global and local feature importance.
Use an explainer to interpret a model.
Create model explanations in a training experiment.
Visualize model explanations.

PRACTICE QUIZ: KNOWLEDGE CHECK 1

1. Which type of machine learning tasks support automated machine learning in model training? Select all that apply.

Clustering
Classification (CORRECT)
Regression (CORRECT)
Time Series Forecasting (CORRECT)

Correct: This type of machine learning task can use automated machine learning in Azure ML to train models.

2. Which of the following are classification algorithms that include support for Azure Machine Learning? Select all that apply.

Linear Regression
Logistic Regression (CORRECT)
Deep Neural Network (DNN) Classifier (CORRECT)
Decision Tree (CORRECT)

Correct: This classification algorithm include support for Azure ML.

3. Which of the following are forecasting algorithms that include support for Azure Machine Learning? Select all that apply.

Linear Support Vector Machine (SVM)
Naive Bayes
Light Gradient Boosting Machine (GBM) (CORRECT)
Elastic Net (CORRECT)

Correct: This forecasting and regression algorithm include support for Azure ML.

4. True or False?

Automated machine learning can apply preprocessing transformations to your data with the purpose of improving the performance of the model.

True (CORRECT)
False

Correct: Automated machine learning applies scaling and normalization to numeric data automatically, helping prevent any large-scale features from dominating training.

5. Which is one of the most important settings you must specify in relation to Automated ML?

Second validation dataset or dataframe
A numpy array of X values containing the training features
Dataframe of training data
The primary metric (CORRECT)

Correct: One of the most important settings you must specify is the primary_metric. This is the target performance metric for which the optimal model will be determined.

PRACTICE QUIZ: KNOWLEDGE CHECK 2

1. What is the name of the parameter that configures the amount of variation caused by adding noise?

Epsilon (CORRECT)
Lambda
Psi
Sigma

Correct: This value governs the amount of additional risk that your personal data can be identified through rejecting the opt-out option and participating in a study.

2. True or False?

The Epsilon parameter can apply the privacy principle to a specific group of people or everyone participating in a study.

True
False (CORRECT)

Correct: When using the epsilon parameter, the key thing to remember is that it applies the privacy principle for everyone participating in a study.

3. What is the ratio of the Epsilon value in terms of privacy and accuracy? Select all that apply.

High epsilon value equals more privacy and less accuracy
Low epsilon value equals less privacy and more accuracy
High epsilon value equals less privacy and more accuracy (CORRECT)
Low epsilon value equals more privacy and less accuracy (CORRECT)

Correct: A higher epsilon value results in aggregations that are truer to the actual data distribution, but in which the individual contribution of a single individual to the aggregated value is less obscured by noise.

Correct: A low epsilon value provides the most privacy, at the expense of less accuracy when aggregating the data.

4. Which of the following statements is true in a differential privacy solution?

In a dataset, numeric values that are encrypted cannot be used
During analysis, noise is added to the data so that aggregations are statistically consistent with the data distribution but non-deterministic (CORRECT)
In a dataset, all columns that are numeric are converted to the mean value

Correct: In a differential privacy solution, noise is added to the data when generating analyses so that aggregations are statistically consistent but non-deterministic; and individual contributions to the aggregations cannot be determined.

5. What should you do in a differential privacy solution to ensure that an individual’s data has a low impact on the aggregated results?

Set epsilon to a high value.
Set epsilon to 0.5
Set epsilon to a low value (CORRECT)

Correct: The lower the epsilon, the less impact an individual’s data has on aggregated results, and therefore the risk of exposure is reduced.

QUIZ: TEST PREP

1. You need to retrieve the primary metric for a regression task. How can you code this in Python?

from azureml.train.automl.utilities import get_primary_metrics (CORRECT)
get_primary_metrics(‘regression’)
from azureml.train.automl.utilities import feed_primary_metrics
feed_primary_metrics(‘regression’)

from azureml.train.automl.utilities import pull_primary_metrics
pull_primary_metrics(‘regression’)

from azureml.train.automl.utilities import catch_primary_metrics
catch_primary_metrics(‘regression’)

Correct: This is the correct code expression.

2. You need to retrieve the best run and its model. How can you code that with the SDK?

best_run, fitted_model = automl.run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
metric = best_run_metrics[metric_name]
print(metric_name, metric)

best_run, fitted_model = automl_run.get_input()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
metric = best_run_metrics[metric_name]
print(metric_name, metric)

best_run, fitted_model = automl_run.get_output() (CORRECT)
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
metric = best_run_metrics[metric_name]
print(metric_name, metric)

best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run_get_metrics(1)
for metric_name in best_run_metrics:
metric = best_run_metrics[metric_name]
print(metric_name, metric)

Correct: This would be the correct code for the task.

3. How can you code an instance of a MimicExplainer for a model named loan_model?

from interpret.ext.blackbox import MimicExplainer (CORRECT)
from interpret.ext.glassbox import DecisionTreeExplainableModel
mim_explainer = MimicExplainer(model=loan_model,
explainable_model = DecisionTreeExplainableModel,
classes=[‘loan_amount’,’income’,’age’,’marital_status’],
features=[‘reject’, ‘approve’])

from interpret.ext.blackbox import MimicExplainer
from interpret.ext.glassbox import DecisionTreeExplainableModel
mim_explainer = MimicExplainer(model=loan_model,
initialization_examples=X_test,
explainable_model = DecisionTreeExplainableModel,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])

from interpret.ext.blackbox import MimicExplainer
from interpret.ext.glassbox import DecisionTreeExplainableModel
mim_explainer = MimicExplainer(model=loan_model,
initialization_examples=X_test,
explainable_model = DecisionTreeExplainableModel,
features=[‘loan_amount’,’income’,’age’,’marital_status’],

from interpret.ext.blackbox import MimicExplainer
from interpret.ext.glassbox import DecisionTreeExplainableModel
mim_explainer = MimicExplainer(model=loan_model,
initialization_examples=X_test,
explainable_model = DecisionTree,
classes=[‘loan_amount’,’income’,’age’,’marital_status’],
features=[‘reject’, ‘approve’])

Correct: This would be the correct code for the task.

4. How can you code an instance of a TabularExplainer for a model named loan_model?

from interpret.ext.blackbox import TabularExplainer
tab_explainer = TabularExplainer(model=loan_model,
initialization_examples=X_test,
classes=[‘loan_amount’,’income’,’age’,’marital_status’],
features=[‘reject’, ‘approve’])

from interpret.ext.blackbox import TabularExplainer (CORRECT)
tab_explainer = TabularExplainer(model=loan_model,
initialization_examples=X_test,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])

from interpret.ext.blackbox import Explainer
tab_explainer = TabularExplainer(loan_model,
explainable_model= DecisionTreeExplainableModel,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])

from interpret.ext.blackbox import TabularExplainer
tab_explainer = TabularExplainer(model=loan_model,
explainable_model= DecisionTreeExplainableModel,
initialization_examples=X_test,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])

Correct: This would be the correct code for the task.

5. How can you code a PFIExplainer for a model named loan_model?

from interpret.ext.blackbox import PFIExplainer
pfi_explainer = PFIExplainer(model = loan_model,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])

from interpret.ext.blackbox import PFIExplainer (CORRECT)
pfi_explainer = PFIExplainer(model = loan_model,
explainable_model= DecisionTreeExplainableModel,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])

from interpret.ext.blackbox
pfi_explainer = PFIExplainer(model = loan_model,
initialization_examples=X_test,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])

from interpret.ext.blackbox import PFIExplainer
pfi_explainer = PFIExplainer(model = loan_model,
initialization_examples=X_test,
classes=[‘loan_amount’,’income’,’age’,’marital_status’],
features=[‘reject’, ‘approve’])

Correct: This is the correct code for a PFIExplainer.

6. You need to retrieve local feature importance from a TabularExplainer.

How can you code this in the SDK?

local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_ranked_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()

local.tab_explanation = tab_explainer_explain_local(X_test[0:5]) (CORRECT)
local_tab_features = local_tab_explanation.get_ranked_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()

local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_feature_importance_dict ()
local_tab_importance = local_tab_explanation.get_ranked_local_values()

local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_feature_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()

Correct: This is the correct code for this task.

7. Which packages do you need to install in the run environment to be able to create an explanation in the experiment script? Select all that apply.

azureml-blackbox
azureml-explainer
azureml-interpret (CORRECT)
azureml-contrib-interpret (CORRECT)

Correct: You need to ensure this package is installed in the run environment to create an explanation in your experiment script.

8. Azure Machine Learning includes support for numerous commonly used algorithms for these tasks. Which of the following algorithms are supported?

Choose all options that apply.

Classification algorithms (CORRECT)
Regression algorithms (CORRECT)
Forecasting algorithms (CORRECT)

Correct: Azure Machine Learning includes support for Classification algorithms.

Correct: Azure Machine Learning includes support for Regression algorithms.

Correct: Azure Machine Learning includes support for Forecasting algorithms.

9. Which of the following features does Automated machine learning apply to numeric data automatically?

Choose all options that apply.

Auto-loading
Scaling (CORRECT)
Normalization (CORRECT)

Correct: Automated machine learning applies scaling to numeric data automatically.

Correct: Automated machine learning normalization to numeric data automatically.

10. Which of the following statements is true?

Differential privacy seeks to protect individual data values by adding statistical “noise” to the analysis process. (CORRECT)
Differential privacy seeks to protect individual data values by removing statistical “noise” to the analysis process.

Correct: Differential privacy seeks to protect individual data values by adding statistical “noise” to the analysis process.

11. Which of the following statements is true?

Global feature importance quantifies the relative importance of each feature in the test dataset as a whole. (CORRECT)
Global feature importance measures the influence of each feature value for a specific individual prediction.

Correct: Global feature importance quantifies the relative importance of each feature in the test dataset as a whole.

CONCLUSION – Select Models And Protect Sensitive Data

By mastering automated machine learning in Azure Machine Learning and understanding differential privacy, you will be equipped to find the best models for your data while protecting sensitive information. Additionally, learning about the factors that influence model predictions will enhance your ability to optimize data-driven insights. These skills will enable you to build more accurate, secure, and effective machine learning solutions.

Previous Module

Next Module

Quiztudy Top Courses

Popular in Coursera

Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!

COURSE 3: BUILD AND OPERATE MACHINE LEARNING SOLUTIONS WITH AZURE

Module 5: Select Models And Protect Sensitive Data

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

TABLE OF CONTENT

INTRODUCTION – Select Models And Protect Sensitive Data

Learning Objectives

PRACTICE QUIZ: KNOWLEDGE CHECK 1

1. Which type of machine learning tasks support automated machine learning in model training? Select all that apply.

2. Which of the following are classification algorithms that include support for Azure Machine Learning? Select all that apply.

3. Which of the following are forecasting algorithms that include support for Azure Machine Learning? Select all that apply.

4. True or False?

Automated machine learning can apply preprocessing transformations to your data with the purpose of improving the performance of the model.

5. Which is one of the most important settings you must specify in relation to Automated ML?

PRACTICE QUIZ: KNOWLEDGE CHECK 2

1. What is the name of the parameter that configures the amount of variation caused by adding noise?

2. True or False?

The Epsilon parameter can apply the privacy principle to a specific group of people or everyone participating in a study.

3. What is the ratio of the Epsilon value in terms of privacy and accuracy? Select all that apply.

4. Which of the following statements is true in a differential privacy solution?

5. What should you do in a differential privacy solution to ensure that an individual’s data has a low impact on the aggregated results?

QUIZ: TEST PREP

1. You need to retrieve the primary metric for a regression task. How can you code this in Python?

2. You need to retrieve the best run and its model. How can you code that with the SDK?

3. How can you code an instance of a MimicExplainer for a model named loan_model?

4. How can you code an instance of a TabularExplainer for a model named loan_model?

5. How can you code a PFIExplainer for a model named loan_model?

6. You need to retrieve local feature importance from a TabularExplainer.

How can you code this in the SDK?

7. Which packages do you need to install in the run environment to be able to create an explanation in the experiment script? Select all that apply.

8. Azure Machine Learning includes support for numerous commonly used algorithms for these tasks. Which of the following algorithms are supported?

Choose all options that apply.

9. Which of the following features does Automated machine learning apply to numeric data automatically?

Choose all options that apply.

10. Which of the following statements is true?

11. Which of the following statements is true?

CONCLUSION – Select Models And Protect Sensitive Data

Quiztudy Top Courses

Popular in Coursera

Mood Zone for Studying & Relaxing