COURSE 4: PERFORM DATA SCIENCE WITH AZURE DATABRICKS

Module 6: Train A Distributed Neural Network And Serve Models With Azure Machine Learning

MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERITIFICATE

Complete Coursera Study Guide

Last updated:

INTRODUCTION – Train A Distributed Neural Network And Serve Models With Azure Machine Learning

In this module, you will learn how to leverage Uber’s Horovod framework together with the Petastorm library to execute distributed deep learning training jobs on Spark, utilizing training datasets in the Apache Parquet format. Additionally, you will explore how to use MLflow and the Azure Machine Learning service to register, package, and deploy a trained model. The deployment targets include both Azure Container Instance and Azure Kubernetes Service, enabling you to set up a scoring web service.

Learning Objectives

  • Use MLflow to track experiments, log metrics, and compare runs
  • Understand hyperparameter tuning and its role in machine learning
  • Learn how to use modules from the PySpark’s machine learning library for hyperparameter tuning and model selection

PRACTICE QUIZ: KNOWLEDGE CHECK 1

1. What is HorovodRunner?

  • A Python class
  • A framework
  • A general API (CORRECT)
  • A logging API

Correct: HorovodRunner is a general API for running distributed DL workloads on Databricks using Uber’s Horovod framework.

2. What does HorovodRunner use to take a Python method that contains deep learning training code?

  • URL
  • URI
  • Paths
  • Hooks (CORRECT)

Correct: HorovodRunner takes a Python method that contains deep learning training code with Horovod hooks. HorovodRunner pickles the method on the driver and distributes it to Spark workers.

3. Which are two methods supported by the HorovodRunner API?

  • run(self, main, np, **kwargs)
  • init(self, main)
  • run(self, main, **kwargs) (CORRECT)
  • init(self, np) (CORRECT)

Correct: This method is supported by the HorovodRunner API and it runs a Horovod training job invoking main(**kwargs). The main function and the keyword arguments are serialized using cloudpickle and distributed to cluster workers.

Correct: This method is supported by the HorovodRunner API and it creates an instance of HorovodRunner.

4. Regarding the MPI concepts on which the Horovod core principles are based on, which MPI concept would be the unique process ID

  • Density
  • Rank (CORRECT)
  • Size
  • Local Rank

Correct: Rank would be the unique process ID.

5. True or false?

TensorFlow objects cannot be found or pickled using the HorovodRunner API.

  • True
  • False (CORRECT)

Correct: A common error is that TensorFlow objects cannot be found or pickled. This happens when the library import statements are not distributed to other executors. To avoid this issue, include all import statements (for example, import tensorflow as tf) both at the top of the Horovod training method and inside any other user-defined functions called in the Horovod training method.

PRACTICE QUIZ: KNOWLEDGE CHECK 2

1. To deploy a model to Azure ML, you must create or obtain an Azure ML Workspace.

You can do that programmatically by using a function.

Which of the following functions can you use to create the workspace?

  • azureml.core.environment.create()
  • azureml.core.workspace.create() (CORRECT)
  • azureml.core.model.create()
  • azureml.core.dataset.workspace()

Correct: The azureml.core.Workspace.create() function will load a workspace of a specified name or create one if it does not already exist.

2. You want to use Azure ML to train a Diabetes Model and build a container image for the trained model.

You will use the scikit-learn ElasticNet linear regression model.

You need to load the diabetes datasets. How should you code that?

  • diabetes = datasets.load_diabetes() (CORRECT)
  • X = diabetes.data
  • y = diabetes.target
  • diabetes = datasets_load_diabetes()
  • X = diabetes.data
  • y = diabetes.target
  • diabetes.tf = datasets.load_diabetes()
  • X = diabetes.data
  • y = diabetes.target
  • datasets = diabetes.load()
  • X = diabetes.data
  • y = diabetes.target

Correct: This is the correct code for the task.

3. When working with Azure ML, you can use MLflow to build a container image for the trained model.

Which MLflow function can you use for that task?

  • mlflow.build_image()
  • azureml.mlflow.build_image()
  • mlflow.azureml.build_image() (CORRECT)
  • mlflow.azureml.build.image()

Correct: This function will build a container image for the trained MLflow model and will also register the MLflow model with a specified Azure ML workspace.

4. Which kind of HTTP request can you send to the AKS webservice’s scoring endpoint to evaluate the sample data?

  • PUT
  • PATCH
  • POST (CORRECT)
  • GET

Correct: POST is used to send data to a server to create/update a resource. Query the AKS webservice’s scoring endpoint by sending an HTTP POST request that includes the input vector.

5. Which Azure ML function can you use to replace the deployment’s existing model image with the new model image?

  • azureml.core.webservice.AksWebservice.update() (CORRECT)
  • azureml.core.webservice.AksWebservice.serialize()
  • azureml.core.webservice.AksWebservice.add_properties()
  • azureml.core.webservice.AksWebservice.deploy_configuration()

Correct: This function will update the webservice with the provided properties, which you can use to replace the current model image with the new model image.

QUIZ: TEST PREP

1. When developing a distributed training program using HorovodRunner you would generally follow these steps:

1. Create a HorovodRunner instance initialized with the number of nodes.

2. Define a Horovod training method according to the methods described in Horovod usage, making sure to add any import statements inside the method.

3. Pass the training method to the HorovodRunner instance.

How would you code that in Python?

  • hr = HorovodRunner()
  • def train():
  •  import tensorflow as tf
  •  hvd.init(np)
  • hr.run(train)
  • hr = HorovodRunner(np=2) (CORRECT)
  • def train():
  • import tensorflow as tf
  •  hvd.init()
  • hr.run(train)
  • hr = HorovodRunner(tf)
  • def train():
  •  import tensorflow as np
  •  hvd.init(2)
  • hr.run(train)
  • hr = HorovodRunner(np)
  • def train():
  •  import tensorflow as tf
  •  hvd.init()
  • hr.run(train)

Correct: This would be the correct code syntax.

2. You’re using Horovod to train a distributed neural network using Parquet files and Petastorm.

You have a dataset of housing prices in California named cal_housing.

After loading the data, you want to create a Spark DataFrame from the Pandas DataFrame so that you can concatenate the features and labels of the model.

How would you code that in Python

  • data = pd.concat([pd.DataFrame(X_train, columns=cal_housing.feature_names), pd.DataFrame(y_train, columns=[“label”])])
  • trainDF = spark.createDataFrame(data)
  • display(trainDF)
  • data = pd.concat([pd.DataFrame(X_train, columns=cal_housing.feature_names), pd.DataFrame(y_train, columns=[“label”])], axis=1) (CORRECT)
  • trainDF = spark.createDataFrame(data)
  • display(trainDF)
  • data = pd.concat([pd.DataFrame(X_train, columns=cal_housing.feature_names), pd.DataFrame(y_train, columns=[“label”])], axis=1)
  • trainDF = spark.createDataFrame()
  • display(trainDF)
  • data = pd.concat([pd.DataFrame(X_train, columns=cal_housing.feature_names), pd.DataFrame(y_train, columns=[“label”])], axis=1)
  • trainDF = spark.DataFrame(data)
  • display(trainDF)

Correct: This would be the correct code syntax.

3. You’re using Horovod to train a distributed neural network using Parquet files and Petastorm.

You have a dataset of housing prices in California named cal_housing.

After loading the data, you created a Spark DataFrame from the Pandas DataFrame so that you can concatenate the features and labels of the model.

Now you need to create Dense Vectors for the features.

How would you code that in Python?

  • from pyspark.ml.feature import VectorAssembler
  • vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
  • vecTrainDF = vecAssembler.transform(trainDF).hook(“features”, “label”)
  • display(vecTrainDF)
  • from pyspark.ml.feature import VectorAssembler (CORRECT)
  • vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”labels “)
  • vecTrainDF = vecAssembler.transform(trainDF).select(“features”, “label”)
  • display(vecTrainDF)
  • from pyspark.ml.feature import VectorAssembler
  • vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
  • vecTrainDF = vecAssembler.transform(trainDF).call(“features”, “label”)
  • display(vecTrainDF)
  • from pyspark.ml.feature import VectorAssembler
  • vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)vecTrainDF = vecAssembler.transform(trainDF).select(“features”, “label”)
  • display(vecTrainDF)

Correct: This is the correct code for the task.

4. True or false?

Petastorm requires a Vector as an input, not an Array.

  • True
  • False (CORRECT)

Correct: It’s actually the other way around. Petastrom requires arrays as input, not vectors.

5. You’re working with Azure Machine Learning and you want to train a Diabetes Model and build a container image for the trained model.

You will use the scikit-learn ElasticNet linear regression model.

You want to deploy the model to production using Azure Kubernetes Service (AKS).

You don’t have an active AKS cluster, so you need to create one using the Azure ML SDK.

You’ll be using the default configuration.

How would you code that?

  • aks_target = ComputeTarget.workspace = workspace 
  •  (name = aks_cluster_name, 
  •  provisioning_configuration = prov_config)
  • aks_target = ComputeTarget.deploy(workspace = workspace, 
  •  name = aks_cluster_name, 
  •  provisioning_configuration = prov_config)
  • aks_target = ComputeTarget.create(workspace = workspace, (CORRECT)
  • name = aks_cluster_name, 
  • provisioning_configuration = prov_config)
  • aks_target = ComputeTarget.create(workspace = workspace, 
  •  name = aks_cluster_name,)

Correct: This is the correct code for this task. 

6. You’re working with Azure Machine Learning and you want to train a Diabetes Model and build a container image for the trained model.

You will use the scikit-learn ElasticNet linear regression model.

You want to deploy the model to production using Azure Kubernetes Service (AKS).

You’ve created a AKS cluster for model deployment.

You’ve deployed the model’s image to the specified AKS cluster.

After you’ve trained a new model with different hyperparameters, you need to deploy the new model’s image to the AKS cluster.

How would you code that?

  • prod_webservice.update(image=model_image_updated) (CORRECT)
  • prod_webservice.wait_for_deployment(show_output = True)
  • prod_webservice.delete (image=model_image_updated)
  • prod_webservice.wait_for_deployment(show_output = True)
  • prod_webservice.deploy (image=model_image_updated)
  • prod_webservice.wait_for_deployment(show_output = True)
  • prod_webservice.create (image=model_image_updated)
  • prod_webservice.wait_for_deployment(show_output = True)

Correct: This is the correct code for this task.

7. After working with Azure Machine Learning, you want to clean up the deployments and terminate the “dev” ACI webservice using the Azure ML SDK.

Which method should do the job?

  • dev_webservice.delete() (CORRECT)
  • dev_webservice.remove()
  • dev_webservice.flush()
  • dev_webservice.terminate()

Correct: Because ACI manages compute resources on your behalf, deleting the “dev” ACI webservice will remove all resources associated with the “dev” model deployment

CONCLUSION – Train A Distributed Neural Network And Serve Models With Azure Machine Learning

By the end of this module, you will be proficient in using Uber’s Horovod framework and the Petastorm library for distributed deep learning training on Spark with Apache Parquet datasets. Moreover, you will be able to effectively employ MLflow and Azure Machine Learning service to register, package, and deploy models to Azure Container Instance and Azure Kubernetes Service, setting up a robust scoring web service.