COURSE 3: BUILD AND OPERATE MACHINE LEARNING SOLUTIONS WITH AZURE

Module 4: Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Enroll in Coursera Microsoft Azure Data Scientist Associate (DP-100)

Last updated:

July 10, 2024

INTRODUCTION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

Machine learning models are frequently utilized to generate predictions from large datasets through batch processing. In this module, you’ll learn how to use Azure Machine Learning to publish a batch inference pipeline, allowing you to handle these extensive prediction tasks efficiently. Additionally, you’ll leverage cloud-scale experiments to select the optimal hyperparameter values for model training, ensuring your models perform at their best.

Learning Objectives

Publish batch inference pipeline for a trained model.
Use a batch inference pipeline to generate predictions.
Define a hyperparameter search space.
Configure hyperparameter sampling.
Select an early-termination policy.
Run a hyperparameter tuning experiment.

PRACTICE QUIZ: KNOWLEDGE CHECK

1. What is the terminology used for long-running tasks that operate on large volumes of data?

Cluster operations
Bunch operations
Stack operations
Batch operations (CORRECT)

Correct: Long-running tasks that operate on large volumes of data are performed as batch operations. In machine learning, batch inferencing is used to apply a predictive model to multiple cases asynchronously – usually writing the results to a file or database.

2. When creating a batch inferencing pipeline, which of the tasks below should be performed first?

Run the pipeline and retrieve the step output
Create a scoring script
Register a model (CORRECT)
Create a pipeline with a ParallelRunStep

Correct: This has to be the first step. To use a trained model in a batch inferencing pipeline, you must register it in your Azure Machine Learning workspace.

3. Which are the two functions included in the scoring script of the batch inference pipeline? Select all that apply.

init(batch)
run(batch)
init() (CORRECT)
run(mini_batch) (CORRECT)

Correct: This function is called when the pipeline is initialized.

Correct: This function is called for each batch of data to be processed.

4. What is the type of ParallelRunStep that must be used in the pipeline for parallel batch inferencing? Select all that apply.

Parameter
Method
Object (CORRECT)
Class (CORRECT)

Correct: You can create objects from the ParallelRunStep class to be used in the pipeline.

Correct: You must import the ParallelRunStep class in order to use it in the pipeline.

5. After you run your pipeline, in which file can you observe the results?

parallel_run_config
OutputFileDatasetConfig
parallel_run_step.txt (CORRECT)

Correct: You can retrieve the parallel_run_step.txt file from the output of the step to view the results.

PRACTICE QUIZ: KNOWLEDGE CHECK

1. What are hyperparameters?

Values that are passed into a function
Values determined from the training features
Values used to configure training behavior which are not derived from the training data (CORRECT)

Correct: These are the hyperparameters used in data science.

2. The process of hyperparameter tuning consists of?

Training multiple models, using the same algorithm, training data, and hyperparameter values.
Training multiple models, using the same algorithm but different training data and different hyperparameter values.
Training multiple models, using different algorithms, same training data and different hyperparameter values.
Training multiple models, using the same algorithm and training data but different hyperparameter values. (CORRECT)

Correct: This is the process of hyperparameter tuning. The resulting model from each training run is then evaluated to determine the performance metric for which you want to optimize (for example, accuracy), and the best-performing model is selected.

3. Which of the following are valid discrete distributions from which you can select discrete values for discrete hyperparameters? Select all that apply.

Qbasic
Qlogbasic (CORRECT)
Qnormal (CORRECT)
Quniform (CORRECT)
Qlognormal (CORRECT)

Correct: This is a valid discrete distribution.

4. Which of the following are valid types of sampling used in hyperparameter tuning? Select all that apply.

Byzantine sampling
Grid sampling (CORRECT)
Bayesian sampling (CORRECT)
Random sampling (CORRECT)

Correct: Grid sampling can only be employed when all hyperparameters are discrete.

Correct: Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm.

Correct: Random sampling is used to randomly select a value for each hyperparameter.

5. Which of the following are valid types of early termination policies you can implement? Select all that apply.

Median stopping policy
Waiting policy (CORRECT)
Bandit policy (CORRECT)
Truncation selection policy (CORRECT)

Correct: This is not a valid type of policy.

Correct: The bandit policy to stops a run if the target performance metric underperforms the best run so far by a specified margin.

Correct: A truncation selection policy cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X.

QUIZ: TEST PREP

1. To register a model using a reference to the Run used to train the model, which SDK commands can you use?

from azureml.core import Model (CORRECT)
run.register_model( model_name=’classification_model’,
model_path=’outputs/model.pkl’,
description=’A classification model’)

from azureml.core import Object
classification_model = Model.register(workspace=your_workspace,
model_name=’classification_model’,
model_path=’model.pkl’,
description=’A classification model’)

from azureml.core import Object
run.register_model( model_name=’classification_model’,
model_path=’outputs/model.pkl’,
description=’A classification model’)

from azureml.core import Model
classification_model = Model.register(workspace=your_workspace,
model_name=’classification_model’,
model_path=’model.pkl’,
description=’A classification model’)

Correct: These are the correct commands for the job.

2. Which of the following SDK commands can you use to create a parallel run step?

parallelrun_step = ParallelRunStep(
name=’batch-score’,
parallel_run_config=parallel.run.config,
inputs=[batch_data_set.as_named_input(‘batch_data’)],
output=output_dir,
arguments=[],
allow_reuse=True

parallelrun_step = ParallelRunStep(
name=’batch-score’,
parallel.run.config=parallel_run_config,
inputs=[batch_data_set.as_named_input(‘batch_data’)],
output=output_dir,
arguments=[],
allow_reuse=True

parallelrun.step = ParallelRunStep(
name=’batch-score’,
parallel_run_config=parallel_run_config,
inputs=[batch_data_set.as_named_input(‘batch_data’)],
output=output_dir,
arguments=[],
allow_reuse=True

parallelrun_step = ParallelRunStep( (CORRECT)
name=’batch-score’,
parallel_run_config=parallel_run_config,
inputs=[batch_data_set.as_named_input(‘batch_data’)],
output=output_dir,
arguments=[],
allow_reuse=True

Correct: This is the correct code for this task.

3. After the run of the pipeline has completed, which code can you use to retrieve the parallel_run_step.txt file from the output of the step?

df = pd.read_csv(result_file, delimiter=”:”, header=None)
df.columns = [“File”, “Prediction”]
print(df)

prediction_run = next(pipeline_run.get_children())
prediction_output = prediction_run.get_output_data(‘inferences’)
prediction_output.download(local_path=’results’)

for root, dirs, files in os.walk(‘results’): (CORRECT)
for file in files:
if file.endswith(‘parallel_run_step.txt’):
result_file = os.path.join(root,file)

Correct: This code will find the parallel_run_step.txt file.

4. You want to define a search space for hyperparameter tuning. The batch_size hyperparameter can have the value 128, 256, or 512 and the learning_rate hyperparameter can have values from a normal distribution with a mean of 10 and a standard deviation of 3.

How can you code this in Python?

from azureml.train.hyperdrive import choice, uniform (CORRECT)
param_space = {
‘–batch_size’: choice(128, 256, 512),
‘–learning_rate’: uniform(10, 3)
}

from azureml.train.hyperdrive import choice, normal
param_space = {
‘–batch_size’: choice(128, 256, 512),
‘–learning_rate’: qnormal(10, 3)
}

from azureml.train.hyperdrive import choice, normal
param_space = {
‘–batch_size’: choice(128, 256, 512),
‘–learning_rate’: lognormal(10, 3)
}

from azureml.train.hyperdrive import choice, normal
param_space = {
‘–batch_size’: choice(128, 256, 512),
‘–learning_rate’: normal(10, 3)
}

Correct: This is the correct code for this task.

5. How does random sampling select values for hyperparameters?

From a mix of discrete and continuous values (CORRECT)
It tries to select parameter combinations that will result in improved performance from the previous selection
It tries every possible combination of parameters in the search space

Correct: Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values.

6. True or False?

Bayesian sampling can be used only with choice, uniform and quniform parameter expressions, and it can be combined with an early-termination policy.

True
False (CORRECT)

Correct: You can only use Bayesian sampling with choice, uniform, and quniform parameter expressions, but you can’t combine it with an early-termination policy.

7. You want to implement a median stopping policy. How can you code this in Python?

from azureml.train.hyperdrive import MedianStoppingPolicy (CORRECT)
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
delay_evaluation=5)

from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy = MedianStoppingPolicy(truncation_percentage=10,
evaluation_interval=1,
delay_evaluation=5)

from azureml.train.hyperdrive import MedianStoppinPolicy
early_termination_policy = MedianStoppingPolicy(slack_amount = 0.2,
evaluation_interval=1,
delay_evaluation=5)

Correct: This is the correct code for this task.

8. True or false?

You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.

True (CORRECT)
False

Correct: You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.

CONCLUSION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

By mastering the use of Azure Machine Learning for publishing batch inference pipelines and conducting cloud-scale experiments, you will be able to efficiently generate predictions from large datasets and optimize model performance. These skills will enable you to handle extensive prediction tasks effectively and ensure your models are finely tuned for accuracy and efficiency.

Previous Module

Next Module

Quiztudy Top Courses

Popular in Coursera

Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!

COURSE 3: BUILD AND OPERATE MACHINE LEARNING SOLUTIONS WITH AZURE

Module 4: Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

TABLE OF CONTENT

INTRODUCTION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

Learning Objectives

PRACTICE QUIZ: KNOWLEDGE CHECK

1. What is the terminology used for long-running tasks that operate on large volumes of data?

2. When creating a batch inferencing pipeline, which of the tasks below should be performed first?

3. Which are the two functions included in the scoring script of the batch inference pipeline? Select all that apply.

4. What is the type of ParallelRunStep that must be used in the pipeline for parallel batch inferencing? Select all that apply.

5. After you run your pipeline, in which file can you observe the results?

PRACTICE QUIZ: KNOWLEDGE CHECK

1. What are hyperparameters?

2. The process of hyperparameter tuning consists of?

3. Which of the following are valid discrete distributions from which you can select discrete values for discrete hyperparameters? Select all that apply.

4. Which of the following are valid types of sampling used in hyperparameter tuning? Select all that apply.

5. Which of the following are valid types of early termination policies you can implement? Select all that apply.

QUIZ: TEST PREP

1. To register a model using a reference to the Run used to train the model, which SDK commands can you use?

2. Which of the following SDK commands can you use to create a parallel run step?

3. After the run of the pipeline has completed, which code can you use to retrieve the parallel_run_step.txt file from the output of the step?

4. You want to define a search space for hyperparameter tuning. The batch_size hyperparameter can have the value 128, 256, or 512 and the learning_rate hyperparameter can have values from a normal distribution with a mean of 10 and a standard deviation of 3.

How can you code this in Python?

5. How does random sampling select values for hyperparameters?

6. True or False?

Bayesian sampling can be used only with choice, uniform and quniform parameter expressions, and it can be combined with an early-termination policy.

7. You want to implement a median stopping policy. How can you code this in Python?

8. True or false?

You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.

CONCLUSION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

Quiztudy Top Courses

Popular in Coursera

Mood Zone for Studying & Relaxing