COURSE 3: BUILD AND OPERATE MACHINE LEARNING SOLUTIONS WITH AZURE
Module 4: Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning
MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE
Complete Coursera Study Guide
Last updated:
INTRODUCTION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning
Machine learning models are frequently utilized to generate predictions from large datasets through batch processing. In this module, you’ll learn how to use Azure Machine Learning to publish a batch inference pipeline, allowing you to handle these extensive prediction tasks efficiently. Additionally, you’ll leverage cloud-scale experiments to select the optimal hyperparameter values for model training, ensuring your models perform at their best.
Learning Objectives
- Publish batch inference pipeline for a trained model.
- Use a batch inference pipeline to generate predictions.
- Define a hyperparameter search space.
- Configure hyperparameter sampling.
- Select an early-termination policy.
- Run a hyperparameter tuning experiment.
PRACTICE QUIZ: KNOWLEDGE CHECK
1. What is the terminology used for long-running tasks that operate on large volumes of data?
- Cluster operations
- Bunch operations
- Stack operations
- Batch operations (CORRECT)
Correct: Long-running tasks that operate on large volumes of data are performed as batch operations. In machine learning, batch inferencing is used to apply a predictive model to multiple cases asynchronously – usually writing the results to a file or database.
2. When creating a batch inferencing pipeline, which of the tasks below should be performed first?
- Run the pipeline and retrieve the step output
- Create a scoring script
- Register a model (CORRECT)
- Create a pipeline with a ParallelRunStep
Correct: This has to be the first step. To use a trained model in a batch inferencing pipeline, you must register it in your Azure Machine Learning workspace.
3. Which are the two functions included in the scoring script of the batch inference pipeline? Select all that apply.
- init(batch)
- run(batch)
- init() (CORRECT)
- run(mini_batch) (CORRECT)
Correct: This function is called when the pipeline is initialized.
Correct: This function is called for each batch of data to be processed.
4. What is the type of ParallelRunStep that must be used in the pipeline for parallel batch inferencing? Select all that apply.
- Parameter
- Method
- Object (CORRECT)
- Class (CORRECT)
Correct: You can create objects from the ParallelRunStep class to be used in the pipeline.
Correct: You must import the ParallelRunStep class in order to use it in the pipeline.
5. After you run your pipeline, in which file can you observe the results?
- parallel_run_config
- OutputFileDatasetConfig
- parallel_run_step.txt (CORRECT)
Correct: You can retrieve the parallel_run_step.txt file from the output of the step to view the results.
PRACTICE QUIZ: KNOWLEDGE CHECK
1. What are hyperparameters?
- Values that are passed into a function
- Values determined from the training features
- Values used to configure training behavior which are not derived from the training data (CORRECT)
Correct: These are the hyperparameters used in data science.
2. The process of hyperparameter tuning consists of?
- Training multiple models, using the same algorithm, training data, and hyperparameter values.
- Training multiple models, using the same algorithm but different training data and different hyperparameter values.
- Training multiple models, using different algorithms, same training data and different hyperparameter values.
- Training multiple models, using the same algorithm and training data but different hyperparameter values. (CORRECT)
Correct: This is the process of hyperparameter tuning. The resulting model from each training run is then evaluated to determine the performance metric for which you want to optimize (for example, accuracy), and the best-performing model is selected.
3. Which of the following are valid discrete distributions from which you can select discrete values for discrete hyperparameters? Select all that apply.
- Qbasic
- Qlogbasic (CORRECT)
- Qnormal (CORRECT)
- Quniform (CORRECT)
- Qlognormal (CORRECT)
Correct: This is a valid discrete distribution.
Correct: This is a valid discrete distribution.
Correct: This is a valid discrete distribution.
Correct: This is a valid discrete distribution.
4. Which of the following are valid types of sampling used in hyperparameter tuning? Select all that apply.
- Byzantine sampling
- Grid sampling (CORRECT)
- Bayesian sampling (CORRECT)
- Random sampling (CORRECT)
Correct: Grid sampling can only be employed when all hyperparameters are discrete.
Correct: Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm.
Correct: Random sampling is used to randomly select a value for each hyperparameter.
5. Which of the following are valid types of early termination policies you can implement? Select all that apply.
- Median stopping policy
- Waiting policy (CORRECT)
- Bandit policy (CORRECT)
- Truncation selection policy (CORRECT)
Correct: This is not a valid type of policy.
Correct: The bandit policy to stops a run if the target performance metric underperforms the best run so far by a specified margin.
Correct: A truncation selection policy cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X.
QUIZ: TEST PREP
1. To register a model using a reference to the Run used to train the model, which SDK commands can you use?
- from azureml.core import Model (CORRECT)
- run.register_model( model_name=’classification_model’,
- model_path=’outputs/model.pkl’,
- description=’A classification model’)
- from azureml.core import Object
- classification_model = Model.register(workspace=your_workspace,
- model_name=’classification_model’,
- model_path=’model.pkl’,
- description=’A classification model’)
- from azureml.core import Object
- run.register_model( model_name=’classification_model’,
- model_path=’outputs/model.pkl’,
- description=’A classification model’)
- from azureml.core import Model
- classification_model = Model.register(workspace=your_workspace,
- model_name=’classification_model’,
- model_path=’model.pkl’,
- description=’A classification model’)
Correct: These are the correct commands for the job.
2. Which of the following SDK commands can you use to create a parallel run step?
- parallelrun_step = ParallelRunStep(
- name=’batch-score’,
- parallel_run_config=parallel.run.config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[],
- allow_reuse=True
- parallelrun_step = ParallelRunStep(
- name=’batch-score’,
- parallel.run.config=parallel_run_config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[],
- allow_reuse=True
- parallelrun.step = ParallelRunStep(
- name=’batch-score’,
- parallel_run_config=parallel_run_config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[],
- allow_reuse=True
- parallelrun_step = ParallelRunStep( (CORRECT)
- name=’batch-score’,
- parallel_run_config=parallel_run_config,
- inputs=[batch_data_set.as_named_input(‘batch_data’)],
- output=output_dir,
- arguments=[],
- allow_reuse=True
Correct: This is the correct code for this task.
3. After the run of the pipeline has completed, which code can you use to retrieve the parallel_run_step.txt file from the output of the step?
- df = pd.read_csv(result_file, delimiter=”:”, header=None)
- df.columns = [“File”, “Prediction”]
- print(df)
- prediction_run = next(pipeline_run.get_children())
- prediction_output = prediction_run.get_output_data(‘inferences’)
- prediction_output.download(local_path=’results’)
- for root, dirs, files in os.walk(‘results’): (CORRECT)
- for file in files:
- if file.endswith(‘parallel_run_step.txt’):
- result_file = os.path.join(root,file)
Correct: This code will find the parallel_run_step.txt file.
4. You want to define a search space for hyperparameter tuning. The batch_size hyperparameter can have the value 128, 256, or 512 and the learning_rate hyperparameter can have values from a normal distribution with a mean of 10 and a standard deviation of 3.
How can you code this in Python?
- from azureml.train.hyperdrive import choice, uniform (CORRECT)
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: uniform(10, 3)
- }
- from azureml.train.hyperdrive import choice, normal
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: qnormal(10, 3)
- }
- from azureml.train.hyperdrive import choice, normal
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: lognormal(10, 3)
- }
- from azureml.train.hyperdrive import choice, normal
- param_space = {
- ‘–batch_size’: choice(128, 256, 512),
- ‘–learning_rate’: normal(10, 3)
- }
Correct: This is the correct code for this task.
5. How does random sampling select values for hyperparameters?
- From a mix of discrete and continuous values (CORRECT)
- It tries to select parameter combinations that will result in improved performance from the previous selection
- It tries every possible combination of parameters in the search space
Correct: Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values.
6. True or False?
Bayesian sampling can be used only with choice, uniform and quniform parameter expressions, and it can be combined with an early-termination policy.
- True
- False (CORRECT)
Correct: You can only use Bayesian sampling with choice, uniform, and quniform parameter expressions, but you can’t combine it with an early-termination policy.
7. You want to implement a median stopping policy. How can you code this in Python?
- from azureml.train.hyperdrive import MedianStoppingPolicy (CORRECT)
- early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
- delay_evaluation=5)
- from azureml.train.hyperdrive import MedianStoppingPolicy
- early_termination_policy = MedianStoppingPolicy(truncation_percentage=10,
- evaluation_interval=1,
- delay_evaluation=5)
- from azureml.train.hyperdrive import MedianStoppinPolicy
- early_termination_policy = MedianStoppingPolicy(slack_amount = 0.2,
- evaluation_interval=1,
- delay_evaluation=5)
Correct: This is the correct code for this task.
8. True or false?
You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.
- True (CORRECT)
- False
Correct: You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.
CONCLUSION – Deploy Batch Inference Pipelines And Tune Hyperparameters With Azure Machine Learning
By mastering the use of Azure Machine Learning for publishing batch inference pipelines and conducting cloud-scale experiments, you will be able to efficiently generate predictions from large datasets and optimize model performance. These skills will enable you to handle extensive prediction tasks effectively and ensure your models are finely tuned for accuracy and efficiency.
Quiztudy Top Courses
Popular in Coursera
- Google Advanced Data Analytics
- Google Cybersecurity Professional Certificate
- Meta Marketing Analytics Professional Certificate
- Google Digital Marketing & E-commerce Professional Certificate
- Google UX Design Professional Certificate
- Meta Social Media Marketing Professional Certificate
- Google Project Management Professional Certificate
- Meta Front-End Developer Professional Certificate
Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!

