COURSE 5: PREPARE FOR DP-100: DATA SCIENCE ON MICROSOFT AZURE EXAM

Module 2: Exam Preparation Course 1

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Enroll in Coursera Microsoft Azure Data Scientist Associate (DP-100)

Last updated:

July 10, 2024

TABLE OF CONTENT

Introduction
Quiz: Create machine learning models
Conclusion

INTRODUCTION – EXAM PREPARATION COURSE 1

In this module, you will review the content from Course 1 of the Microsoft Azure Data Scientist Associate specialization. This course covers foundational concepts and practical skills necessary for data science using Azure. You will revisit key topics such as data exploration, data preparation, and the basics of machine learning.

Additionally, you will refresh your understanding of how to use Azure Machine Learning service to build, train, and deploy machine learning models. This review will reinforce your knowledge and prepare you for more advanced topics in subsequent courses of the specialization.

Learning Objectives

Outline the key points covered in the Microsoft Azure Data Scientist Associate specialization
Recap on main topic in Course 1: Create machine learning models Assess knowledge and skills in the creating machine learning models”

Quiz: Create machine learning models

1. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person has a disease. The model classifies the case as having a disease”.

False positives
False negatives
True negatives (CORRECT)
True positives

Correct: A true negative is an outcome where the model correctly predicts the negative class.

2. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person does not have a disease. The model classifies the case as having a disease”.

False negatives
True positives
False positives (CORRECT)
True negatives

Correct: A false positive is an outcome where the model incorrectly predicts the positive class.

3. You are a senior data scientist in the company and you are tasked with evaluating a completed binary classification machine learning model.

You need to use the precision as the evaluation metric. Which visualization should you use?

Scatter plot
Gradient descent
Receiver Operating Characteristic (ROC) curve (CORRECT)
Violin plot

Correct: Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the incorrectly classified labels for a particular model.

4. You are a data scientist of a company and you are tasked with building a deep convolutional neural network (CNN) for image classification. The CNN model you built shows signs of overfitting. You need to reduce overfitting and converge the model to an optimal fit.

Which two actions should you perform?

Add an additional dense layer with 64 input units
Reduce the amount of training data
Add an additional dense layer with 512 input units
Add L1/L2 regularization (CORRECT)
Use training data augmentation (CORRECT)

Correct: Weight regularization provides an approach to reduce the overfitting of a deep learning neural network model on the training data and improve the performance of the model on new data, such as the holdout test set. L1L2: Sum of the absolute and the squared weights.

Correct: Adding more training records should decrease the overfitting.

5. Your manager has provided you a dataset created for multiclass classification tasks that contains a normalized numerical feature set with 10,000 data points and 150 features. You use 75 percent of the data points for training and 25 percent for testing.

Name	Description
X_train	Training feature set
Y_train	Training class labels
x_train	Testing feature set
y_train	Testing class labels

You need to apply the Principal Component Analysis (PCA) method to reduce the dimensionality of the feature set to 10 features in both training and testing sets.

You are using the scikit-learn machine learning library in Python.

You use X to denote the feature set and Y to denote class labels.

You create the following Python data frames:

From sklearn.decomposition import PCA

pca – […]

x_train=[…] .fit_transform(X_train)

x_test = pca.[…]

How should you complete the code segment?

Box1: PCA(n_components=10000);
Box2: pca;
Box3: X_train

Box1: PCA(n_components=10);
Box2: model;
Box3: transform(x_test)

Box1: PCA(n_components=150);
Box2: pca;
Box3: x_test

Box1: PCA(n_components=10); (CORRECT)
Box2: pca;
Box3: transform(x_test)

Correct: This is the described metric.

6. You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.

You start by creating a linear regression model. You need to evaluate the linear regression model.

Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:

Does the solution meet the goal?

Yes
No (CORRECT)

Correct: Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models; Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear regression model.

7. What happens when a NumPy array is multiplied by 5?

Array stays the same size, but each element is multiplied by 5. (CORRECT)
The new array will be 5 times longer, with the sequence repeated 5 times.
The new array will be 5 times longer, with the sequence repeated 5 times and also all the elements are multiplied by 5.

Correct: This is how a list behaves when multiplied.

8. You are creating a model and you want to evaluate it. One metric yields an absolute metric in the same unit as the label.

Which metric is described?

Root Mean Square Error (RMSE) (CORRECT)
Coefficient of Determination (known as R-squared or R2)
Mean Square Error (MSE)

Correct: This is the described metric. This means that the smaller the value, the better the model.

9. Complete the sentence:

The Support Vector Machine algorithm is an example of machine learning __________ type model.

Regression
Classification (CORRECT)
Clustering

Correct: Logistic Regression is a well-established algorithm for classification.

10. It is well known that Python provides extensive functionality with powerful and statistical numerical libraries. What is TensorFlow useful for?

Offering simple and effective predictive data analysis (CORRECT)
Supplying machine learning and deep learning capabilities
Providing attractive data visualizations
Analyzing and manipulating data

Correct: Scikit-learn offers simple and effective predictive data analysis.

11. You are creating a binary classification by using a two-class logistic regression model. You need to evaluate the model results for imbalance. Which evaluation metric should you use?

Relative Squared Error
Mean Absolute Error
Relative Absolute Error
AUC Curve (CORRECT)

Correct: This is the described metric.

12. What happens when a list is multiplied by 5?

The new list remains the same size, but the elements are multiplied by 5.
The new list created has the length 5 times the original length with the sequence repeated 5 times and also all the elements are also multiplied by 5.
The new list created has the length 5 times the original length with the sequence repeated 5 times. (CORRECT)

Correct: This is how a list behaves when multiplied.

13. You are tasked to analyze a dataset containing historical data from a local taxi company. You are developing a regression model for this. Your goal is to predict the fare of a taxi trip. You need to select performance metrics to correctly evaluate the regression model.

Which two metrics can you use?

An F1 score that is low
An R-Squared value close to 0
An R-Squared value close to 1 (CORRECT)
A Root Mean Square Error value that is low (CORRECT)

Correct: RMSE and R2 are both metrics for regression models. Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit.

Correct: RMSE and R2 are both metrics for regression models. Root mean squared error (RMSE) creates a single value that summarizes the error in the model.

14. You are creating a model and you want to evaluate it. For this, you take a look on a specific metric which is direct proportional with how well the model fits.

Which evaluation model is described?

Mean Square Error (MSE)
Coefficient of Determination (known as R-squared or R2) (CORRECT)
Root Mean Square Error (RMSE)

Correct: This is the evaluation metric described. In essence, this metric represents how much of the variance between predicted and actual label values the model is able to explain.

15. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person does not have a disease. The model classifies the case as having no disease”.

True positives
True negatives (CORRECT)
False negatives
False positives

Correct: A true negative is an outcome where the model correctly predicts the negative class.

16. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person has a disease. The model classifies the case as having no disease”.

False positives
False negatives (CORRECT)
True positives
True negatives

Correct: A false negative is an outcome where the model incorrectly predicts the negative class.

17. You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.

You start by creating a linear regression model. You need to evaluate the linear regression model.

Solution: Use the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1 score, and AUC:

Does the solution meet the goal?

Yes
No (CORRECT)

CONCLUSION – EXAM PREPARATION COURSE 1

By the end of this module, you will have a solid understanding of the foundational concepts and practical skills necessary for data science using Azure, as covered in Course 1 of the Microsoft Azure Data Scientist Associate specialization. You will be well-prepared to advance to more complex topics, having refreshed your knowledge on data exploration, data preparation, and the basics of machine learning, as well as how to use the Azure Machine Learning service for building, training, and deploying models.

Weekly Breakdown

Next Module

Quiztudy Top Courses

Popular in Coursera

Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!

COURSE 5: PREPARE FOR DP-100: DATA SCIENCE ON MICROSOFT AZURE EXAM

Module 2: Exam Preparation Course 1

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

TABLE OF CONTENT

INTRODUCTION – EXAM PREPARATION COURSE 1

Learning Objectives

Quiz: Create machine learning models

1. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person has a disease. The model classifies the case as having a disease”.

2. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person does not have a disease. The model classifies the case as having a disease”.

3. You are a senior data scientist in the company and you are tasked with evaluating a completed binary classification machine learning model.

You need to use the precision as the evaluation metric. Which visualization should you use?

4. You are a data scientist of a company and you are tasked with building a deep convolutional neural network (CNN) for image classification. The CNN model you built shows signs of overfitting. You need to reduce overfitting and converge the model to an optimal fit.

Which two actions should you perform?

5. Your manager has provided you a dataset created for multiclass classification tasks that contains a normalized numerical feature set with 10,000 data points and 150 features. You use 75 percent of the data points for training and 25 percent for testing.

6. You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.

You start by creating a linear regression model. You need to evaluate the linear regression model.

Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:

Does the solution meet the goal?

7. What happens when a NumPy array is multiplied by 5?

8. You are creating a model and you want to evaluate it. One metric yields an absolute metric in the same unit as the label.

Which metric is described?

9. Complete the sentence:

The Support Vector Machine algorithm is an example of machine learning __________ type model.

10. It is well known that Python provides extensive functionality with powerful and statistical numerical libraries. What is TensorFlow useful for?

11. You are creating a binary classification by using a two-class logistic regression model. You need to evaluate the model results for imbalance. Which evaluation metric should you use?

12. What happens when a list is multiplied by 5?

13. You are tasked to analyze a dataset containing historical data from a local taxi company. You are developing a regression model for this. Your goal is to predict the fare of a taxi trip. You need to select performance metrics to correctly evaluate the regression model.

Which two metrics can you use?

14. You are creating a model and you want to evaluate it. For this, you take a look on a specific metric which is direct proportional with how well the model fits.

Which evaluation model is described?

15. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person does not have a disease. The model classifies the case as having no disease”.

16. Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.

Which error type should you choose for the following description?

“A person has a disease. The model classifies the case as having no disease”.

17. You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.

You start by creating a linear regression model. You need to evaluate the linear regression model.

Solution: Use the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1 score, and AUC:

Does the solution meet the goal?

CONCLUSION – EXAM PREPARATION COURSE 1

Quiztudy Top Courses

Popular in Coursera

Mood Zone for Studying & Relaxing