COURSE 3: BUILD AND OPERATE MACHINE LEARNING SOLUTIONS WITH AZURE

Module 4: Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Last updated:

INTRODUCTION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

Machine learning models are frequently utilized to generate predictions from large datasets through batch processing. In this module, you’ll learn how to use Azure Machine Learning to publish a batch inference pipeline, allowing you to handle these extensive prediction tasks efficiently. Additionally, you’ll leverage cloud-scale experiments to select the optimal hyperparameter values for model training, ensuring your models perform at their best.

Learning Objectives

  • Publish batch inference pipeline for a trained model.
  • Use a batch inference pipeline to generate predictions.
  • Define a hyperparameter search space.
  • Configure hyperparameter sampling.
  • Select an early-termination policy.
  • Run a hyperparameter tuning experiment.

PRACTICE QUIZ: KNOWLEDGE CHECK

1. What is the terminology used for long-running tasks that operate on large volumes of data?

  • Cluster operations
  • Bunch operations
  • Stack operations
  • Batch operations (CORRECT)

Correct: Long-running tasks that operate on large volumes of data are performed as batch operations. In machine learning, batch inferencing is used to apply a predictive model to multiple cases asynchronously – usually writing the results to a file or database.

2. When creating a batch inferencing pipeline, which of the tasks below should be performed first?

  • Run the pipeline and retrieve the step output
  • Create a scoring script
  • Register a model (CORRECT)
  • Create a pipeline with a ParallelRunStep

Correct: This has to be the first step. To use a trained model in a batch inferencing pipeline, you must register it in your Azure Machine Learning workspace.

3. Which are the two functions included in the scoring script of the batch inference pipeline? Select all that apply.

  • init(batch)
  • run(batch)
  • init() (CORRECT)
  • run(mini_batch) (CORRECT)

Correct: This function is called when the pipeline is initialized.

Correct: This function is called for each batch of data to be processed.

4. What is the type of ParallelRunStep that must be used in the pipeline for parallel batch inferencing? Select all that apply.

  • Parameter
  • Method
  • Object (CORRECT)
  • Class (CORRECT)

Correct: You can create objects from the ParallelRunStep class to be used in the pipeline.

Correct: You must import the ParallelRunStep class in order to use it in the pipeline.

5. After you run your pipeline, in which file can you observe the results?

  • parallel_run_config
  • OutputFileDatasetConfig
  • parallel_run_step.txt  (CORRECT)

Correct: You can retrieve the parallel_run_step.txt file from the output of the step to view the results.

PRACTICE QUIZ: KNOWLEDGE CHECK

1. What are hyperparameters?

  • Values that are passed into a function
  • Values determined from the training features
  • Values used to configure training behavior which are not derived from the training data (CORRECT)

Correct: These are the hyperparameters used in data science.

2. The process of hyperparameter tuning consists of?

  • Training multiple models, using the same algorithm, training data, and hyperparameter values.
  • Training multiple models, using the same algorithm but different training data and different hyperparameter values.
  • Training multiple models, using different algorithms, same training data and different hyperparameter values.
  • Training multiple models, using the same algorithm and training data but different hyperparameter values. (CORRECT)

Correct: This is the process of hyperparameter tuning. The resulting model from each training run is then evaluated to determine the performance metric for which you want to optimize (for example, accuracy), and the best-performing model is selected.

3. Which of the following are valid discrete distributions from which you can select discrete values for discrete hyperparameters? Select all that apply.

  • Qbasic
  • Qlogbasic (CORRECT)
  • Qnormal (CORRECT)
  • Quniform (CORRECT)
  • Qlognormal (CORRECT)

Correct: This is a valid discrete distribution.

Correct: This is a valid discrete distribution.

Correct: This is a valid discrete distribution.

Correct: This is a valid discrete distribution.

4. Which of the following are valid types of sampling used in hyperparameter tuning? Select all that apply.

  • Byzantine sampling
  • Grid sampling (CORRECT)
  • Bayesian sampling (CORRECT)
  • Random sampling (CORRECT)

Correct: Grid sampling can only be employed when all hyperparameters are discrete.

Correct: Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm.

Correct: Random sampling is used to randomly select a value for each hyperparameter.

5. Which of the following are valid types of early termination policies you can implement? Select all that apply.

  • Median stopping policy
  • Waiting policy (CORRECT)
  • Bandit policy (CORRECT)
  • Truncation selection policy (CORRECT)

Correct: This is not a valid type of policy.

Correct: The bandit policy to stops a run if the target performance metric underperforms the best run so far by a specified margin.

Correct: A truncation selection policy cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X.

 QUIZ: TEST PREP

1. To register a model using a reference to the Run used to train the model, which SDK commands can you use?

  • from azureml.core import Model (CORRECT)
  • run.register_model( model_name=’classification_model’,
  • model_path=’outputs/model.pkl’, 
  • description=’A classification model’)
  • from azureml.core import Object
  • classification_model = Model.register(workspace=your_workspace,
  • model_name=’classification_model’,
  • model_path=’model.pkl’, 
  • description=’A classification model’)
  • from azureml.core import Object
  • run.register_model( model_name=’classification_model’,
  • model_path=’outputs/model.pkl’, 
  • description=’A classification model’)
  • from azureml.core import Model
  • classification_model = Model.register(workspace=your_workspace,
  • model_name=’classification_model’,
  • model_path=’model.pkl’, 
  • description=’A classification model’)

Correct: These are the correct commands for the job.

2. Which of the following SDK commands can you use to create a parallel run step?

  • parallelrun_step = ParallelRunStep(
  •  name=’batch-score’,
  • parallel_run_config=parallel.run.config,
  •  inputs=[batch_data_set.as_named_input(‘batch_data’)],
  •  output=output_dir,
  •  arguments=[],
  •  allow_reuse=True
  • parallelrun_step = ParallelRunStep(
  •  name=’batch-score’,
  •  parallel.run.config=parallel_run_config,
  •  inputs=[batch_data_set.as_named_input(‘batch_data’)],
  • output=output_dir,
  •  arguments=[],
  •  allow_reuse=True
  • parallelrun.step = ParallelRunStep(
  •  name=’batch-score’,
  •  parallel_run_config=parallel_run_config,
  •  inputs=[batch_data_set.as_named_input(‘batch_data’)],
  •  output=output_dir,
  •  arguments=[],
  •  allow_reuse=True
  • parallelrun_step = ParallelRunStep( (CORRECT)
  •  name=’batch-score’,
  •  parallel_run_config=parallel_run_config,
  •  inputs=[batch_data_set.as_named_input(‘batch_data’)],
  • output=output_dir,
  •  arguments=[],
  • allow_reuse=True

Correct: This is the correct code for this task.

3. After the run of the pipeline has completed, which code can you use to retrieve the parallel_run_step.txt file from the output of the step?

  • df = pd.read_csv(result_file, delimiter=”:”, header=None)
  • df.columns = [“File”, “Prediction”]
  • print(df)
  • prediction_run = next(pipeline_run.get_children())
  • prediction_output = prediction_run.get_output_data(‘inferences’)
  • prediction_output.download(local_path=’results’)
  • for root, dirs, files in os.walk(‘results’): (CORRECT)
  • for file in files:
  • if file.endswith(‘parallel_run_step.txt’):
  • result_file = os.path.join(root,file)

Correct: This code will find the parallel_run_step.txt file.

4. You want to define a search space for hyperparameter tuning. The batch_size hyperparameter can have the value 128, 256, or 512 and the learning_rate hyperparameter can have values from a normal distribution with a mean of 10 and a standard deviation of 3.

How can you code this in Python?

  • from azureml.train.hyperdrive import choice, uniform (CORRECT)
  • param_space = {
  •  ‘–batch_size’: choice(128, 256, 512),
  •  ‘–learning_rate’: uniform(10, 3)
  •  }
  • from azureml.train.hyperdrive import choice, normal
  • param_space = {
  •  ‘–batch_size’: choice(128, 256, 512),
  •  ‘–learning_rate’: qnormal(10, 3)
  •  }
  • from azureml.train.hyperdrive import choice, normal
  • param_space = {
  •  ‘–batch_size’: choice(128, 256, 512),
  •  ‘–learning_rate’: lognormal(10, 3)
  •  }
  • from azureml.train.hyperdrive import choice, normal
  • param_space = {
  •  ‘–batch_size’: choice(128, 256, 512),
  •  ‘–learning_rate’: normal(10, 3)
  •  }

Correct: This is the correct code for this task.

5. How does random sampling select values for hyperparameters?

  • From a mix of discrete and continuous values (CORRECT)
  • It tries to select parameter combinations that will result in improved performance from the previous selection
  • It tries every possible combination of parameters in the search space

Correct: Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values.

6. True or False?

Bayesian sampling can be used only with choice, uniform and quniform parameter expressions, and it can be combined with an early-termination policy.

  • True
  • False (CORRECT)

Correct: You can only use Bayesian sampling with choice, uniform, and quniform parameter expressions, but you can’t combine it with an early-termination policy.

7. You want to implement a median stopping policy. How can you code this in Python?

  • from azureml.train.hyperdrive import MedianStoppingPolicy (CORRECT)
  • early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
  •  delay_evaluation=5)
  • from azureml.train.hyperdrive import MedianStoppingPolicy
  • early_termination_policy = MedianStoppingPolicy(truncation_percentage=10,
  •          evaluation_interval=1,
  •          delay_evaluation=5)
  • from azureml.train.hyperdrive import MedianStoppinPolicy
  • early_termination_policy = MedianStoppingPolicy(slack_amount = 0.2,
  •              evaluation_interval=1,
  •              delay_evaluation=5)

Correct: This is the correct code for this task.

8. True or false?

You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.

  • True (CORRECT)
  • False

Correct: You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.

CONCLUSION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning

By mastering the use of Azure Machine Learning for publishing batch inference pipelines and conducting cloud-scale experiments, you will be able to efficiently generate predictions from large datasets and optimize model performance. These skills will enable you to handle extensive prediction tasks effectively and ensure your models are finely tuned for accuracy and efficiency.