COURSE 3: BUILD AND OPERATE MACHINE LEARNING SOLUTIONS WITH AZURE
Module 5: Select Models And Protect Sensitive Data
MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE
Complete Coursera Study Guide
Last updated:
INTRODUCTION – Select Models And Protect Sensitive Data
In this module, you will learn how to leverage automated machine learning in Azure Machine Learning to identify the best model for your data. Additionally, you will explore differential privacy, a cutting-edge approach that allows for valuable analysis while safeguarding individually identifiable data values.
Furthermore, the module will cover the various factors that influence the predictions models make, providing you with a comprehensive understanding of how to optimize and protect your data-driven insights.
Learning Objectives
- Use Azure Machine Learning’s automated machine learning capabilities to determine the best performing algorithm for your data.
- Use automated machine learning to preprocess data for training.
- Run an automated machine learning experiment.
- Articulate the problem of data privacy.
- Describe how differential privacy works.
- Configure parameters for differential privacy.
- Perform differentially private data analysis.
- Interpret global and local feature importance.
- Use an explainer to interpret a model.
- Create model explanations in a training experiment.
- Visualize model explanations.
PRACTICE QUIZ: KNOWLEDGE CHECK 1
1. Which type of machine learning tasks support automated machine learning in model training? Select all that apply.
- Clustering
- Classification (CORRECT)
- Regression (CORRECT)
- Time Series Forecasting (CORRECT)
Correct: This type of machine learning task can use automated machine learning in Azure ML to train models.
Correct: This type of machine learning task can use automated machine learning in Azure ML to train models.
Correct: This type of machine learning task can use automated machine learning in Azure ML to train models.
2. Which of the following are classification algorithms that include support for Azure Machine Learning? Select all that apply.
- Linear Regression
- Logistic Regression (CORRECT)
- Deep Neural Network (DNN) Classifier (CORRECT)
- Decision Tree (CORRECT)
Correct: This classification algorithm include support for Azure ML.
Correct: This classification algorithm include support for Azure ML.
Correct: This classification algorithm include support for Azure ML.
3. Which of the following are forecasting algorithms that include support for Azure Machine Learning? Select all that apply.
- Linear Support Vector Machine (SVM)
- Naive Bayes
- Light Gradient Boosting Machine (GBM) (CORRECT)
- Elastic Net (CORRECT)
Correct: This forecasting and regression algorithm include support for Azure ML.
Correct: This forecasting and regression algorithm include support for Azure ML.
4. True or False?
Automated machine learning can apply preprocessing transformations to your data with the purpose of improving the performance of the model.
- True (CORRECT)
- False
Correct: Automated machine learning applies scaling and normalization to numeric data automatically, helping prevent any large-scale features from dominating training.
5. Which is one of the most important settings you must specify in relation to Automated ML?
- Second validation dataset or dataframe
- A numpy array of X values containing the training features
- Dataframe of training data
- The primary metric (CORRECT)
Correct: One of the most important settings you must specify is the primary_metric. This is the target performance metric for which the optimal model will be determined.
PRACTICE QUIZ: KNOWLEDGE CHECK 2
1. What is the name of the parameter that configures the amount of variation caused by adding noise?
- Epsilon (CORRECT)
- Lambda
- Psi
- Sigma
Correct: This value governs the amount of additional risk that your personal data can be identified through rejecting the opt-out option and participating in a study.
2. True or False?
The Epsilon parameter can apply the privacy principle to a specific group of people or everyone participating in a study.
- True
- False (CORRECT)
Correct: When using the epsilon parameter, the key thing to remember is that it applies the privacy principle for everyone participating in a study.
3. What is the ratio of the Epsilon value in terms of privacy and accuracy? Select all that apply.
- High epsilon value equals more privacy and less accuracy
- Low epsilon value equals less privacy and more accuracy
- High epsilon value equals less privacy and more accuracy (CORRECT)
- Low epsilon value equals more privacy and less accuracy (CORRECT)
Correct: A higher epsilon value results in aggregations that are truer to the actual data distribution, but in which the individual contribution of a single individual to the aggregated value is less obscured by noise.
Correct: A low epsilon value provides the most privacy, at the expense of less accuracy when aggregating the data.
4. Which of the following statements is true in a differential privacy solution?
- In a dataset, numeric values that are encrypted cannot be used
- During analysis, noise is added to the data so that aggregations are statistically consistent with the data distribution but non-deterministic (CORRECT)
- In a dataset, all columns that are numeric are converted to the mean value
Correct: In a differential privacy solution, noise is added to the data when generating analyses so that aggregations are statistically consistent but non-deterministic; and individual contributions to the aggregations cannot be determined.
5. What should you do in a differential privacy solution to ensure that an individual’s data has a low impact on the aggregated results?
- Set epsilon to a high value.
- Set epsilon to 0.5
- Set epsilon to a low value (CORRECT)
Correct: The lower the epsilon, the less impact an individual’s data has on aggregated results, and therefore the risk of exposure is reduced.
QUIZ: TEST PREP
1. You need to retrieve the primary metric for a regression task. How can you code this in Python?
- from azureml.train.automl.utilities import get_primary_metrics (CORRECT)
- get_primary_metrics(‘regression’)
- from azureml.train.automl.utilities import feed_primary_metrics
- feed_primary_metrics(‘regression’)
- from azureml.train.automl.utilities import pull_primary_metrics
- pull_primary_metrics(‘regression’)
- from azureml.train.automl.utilities import catch_primary_metrics
- catch_primary_metrics(‘regression’)
Correct: This is the correct code expression.
2. You need to retrieve the best run and its model. How can you code that with the SDK?
- best_run, fitted_model = automl.run.get_output()
- best_run_metrics = best_run.get_metrics()
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
- best_run, fitted_model = automl_run.get_input()
- best_run_metrics = best_run.get_metrics()
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
- best_run, fitted_model = automl_run.get_output() (CORRECT)
- best_run_metrics = best_run.get_metrics()
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
- best_run, fitted_model = automl_run.get_output()
- best_run_metrics = best_run_get_metrics(1)
- for metric_name in best_run_metrics:
- metric = best_run_metrics[metric_name]
- print(metric_name, metric)
Correct: This would be the correct code for the task.
3. How can you code an instance of a MimicExplainer for a model named loan_model?
- from interpret.ext.blackbox import MimicExplainer (CORRECT)
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- explainable_model = DecisionTreeExplainableModel,
- classes=[‘loan_amount’,’income’,’age’,’marital_status’],
- features=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import MimicExplainer
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- initialization_examples=X_test,
- explainable_model = DecisionTreeExplainableModel,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import MimicExplainer
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- initialization_examples=X_test,
- explainable_model = DecisionTreeExplainableModel,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- from interpret.ext.blackbox import MimicExplainer
- from interpret.ext.glassbox import DecisionTreeExplainableModel
- mim_explainer = MimicExplainer(model=loan_model,
- initialization_examples=X_test,
- explainable_model = DecisionTree,
- classes=[‘loan_amount’,’income’,’age’,’marital_status’],
- features=[‘reject’, ‘approve’])
Correct: This would be the correct code for the task.
4. How can you code an instance of a TabularExplainer for a model named loan_model?
- from interpret.ext.blackbox import TabularExplainer
- tab_explainer = TabularExplainer(model=loan_model,
- initialization_examples=X_test,
- classes=[‘loan_amount’,’income’,’age’,’marital_status’],
- features=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import TabularExplainer (CORRECT)
- tab_explainer = TabularExplainer(model=loan_model,
- initialization_examples=X_test,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import Explainer
- tab_explainer = TabularExplainer(loan_model,
- explainable_model= DecisionTreeExplainableModel,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import TabularExplainer
- tab_explainer = TabularExplainer(model=loan_model,
- explainable_model= DecisionTreeExplainableModel,
- initialization_examples=X_test,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
Correct: This would be the correct code for the task.
5. How can you code a PFIExplainer for a model named loan_model?
- from interpret.ext.blackbox import PFIExplainer
- pfi_explainer = PFIExplainer(model = loan_model,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import PFIExplainer (CORRECT)
- pfi_explainer = PFIExplainer(model = loan_model,
- explainable_model= DecisionTreeExplainableModel,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox
- pfi_explainer = PFIExplainer(model = loan_model,
- initialization_examples=X_test,
- features=[‘loan_amount’,’income’,’age’,’marital_status’],
- classes=[‘reject’, ‘approve’])
- from interpret.ext.blackbox import PFIExplainer
- pfi_explainer = PFIExplainer(model = loan_model,
- initialization_examples=X_test,
- classes=[‘loan_amount’,’income’,’age’,’marital_status’],
- features=[‘reject’, ‘approve’])
Correct: This is the correct code for a PFIExplainer.
6. You need to retrieve local feature importance from a TabularExplainer.
How can you code this in the SDK?
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
- local_tab_features = local_tab_explanation.get_ranked_local_names()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
- local.tab_explanation = tab_explainer_explain_local(X_test[0:5]) (CORRECT)
- local_tab_features = local_tab_explanation.get_ranked_local_names()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
- local_tab_features = local_tab_explanation.get_feature_importance_dict ()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
- local_tab_features = local_tab_explanation.get_feature_local_names()
- local_tab_importance = local_tab_explanation.get_ranked_local_values()
Correct: This is the correct code for this task.
7. Which packages do you need to install in the run environment to be able to create an explanation in the experiment script? Select all that apply.
- azureml-blackbox
- azureml-explainer
- azureml-interpret (CORRECT)
- azureml-contrib-interpret (CORRECT)
Correct: You need to ensure this package is installed in the run environment to create an explanation in your experiment script.
Correct: You need to ensure this package is installed in the run environment to create an explanation in your experiment script.
8. Azure Machine Learning includes support for numerous commonly used algorithms for these tasks. Which of the following algorithms are supported?
Choose all options that apply.
- Classification algorithms (CORRECT)
- Regression algorithms (CORRECT)
- Forecasting algorithms (CORRECT)
Correct: Azure Machine Learning includes support for Classification algorithms.
Correct: Azure Machine Learning includes support for Regression algorithms.
Correct: Azure Machine Learning includes support for Forecasting algorithms.
9. Which of the following features does Automated machine learning apply to numeric data automatically?
Choose all options that apply.
- Auto-loading
- Scaling (CORRECT)
- Normalization (CORRECT)
Correct: Automated machine learning applies scaling to numeric data automatically.
Correct: Automated machine learning normalization to numeric data automatically.
10. Which of the following statements is true?
- Differential privacy seeks to protect individual data values by adding statistical “noise” to the analysis process. (CORRECT)
- Differential privacy seeks to protect individual data values by removing statistical “noise” to the analysis process.
Correct: Differential privacy seeks to protect individual data values by adding statistical “noise” to the analysis process.
11. Which of the following statements is true?
- Global feature importance quantifies the relative importance of each feature in the test dataset as a whole. (CORRECT)
- Global feature importance measures the influence of each feature value for a specific individual prediction.
Correct: Global feature importance quantifies the relative importance of each feature in the test dataset as a whole.
CONCLUSION – Select Models And Protect Sensitive Data
By mastering automated machine learning in Azure Machine Learning and understanding differential privacy, you will be equipped to find the best models for your data while protecting sensitive information. Additionally, learning about the factors that influence model predictions will enhance your ability to optimize data-driven insights. These skills will enable you to build more accurate, secure, and effective machine learning solutions.
Quiztudy Top Courses
Popular in Coursera
- Google Advanced Data Analytics
- Google Cybersecurity Professional Certificate
- Meta Marketing Analytics Professional Certificate
- Google Digital Marketing & E-commerce Professional Certificate
- Google UX Design Professional Certificate
- Meta Social Media Marketing Professional Certificate
- Google Project Management Professional Certificate
- Meta Front-End Developer Professional Certificate
Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!

