COURSE 5: PREPARE FOR DP-100: DATA SCIENCE ON MICROSOFT AZURE EXAM
Module 4: Exam Preparation Course 3
MICROSOFT AZURE DATA SCIENTIST ASSOCIATE (DP-100) PROFESSIONAL CERTIFICATE
Complete Coursera Study Guide
Last updated:
INTRODUCTION – Exam Preparation Course 3
In this module, you will thoroughly review the content covered in Course 3 of the Microsoft Azure Data Scientist Associate specialization. This comprehensive module will revisit key concepts, methodologies, and tools essential for data science professionals working within the Azure ecosystem.
By engaging with this material, you’ll reinforce your understanding of the advanced topics and practical applications taught in Course 3, ensuring you are well-prepared for real-world data science challenges using Microsoft Azure’s suite of services and solutions.
Learning Objectives
- Outline the key points covered in the Microsoft Azure Data Scientist Associate specialization
- Recap on main topic in Course 3: Build and operate machine learning solutions with Azure Machine Learning
- Assess knowledge and skills in building and operating machine learning solutions with Azure Machine Learning
Quiz: Build and operate machine learning solutions with Azure Machine Learning
1. You create an Azure Machine Learning workspace. You are preparing a local Python environment on a laptop computer.
You want to use the laptop to connect to the workspace and run experiments.
You create the following config.json file:
{ “workspace_name” : “ml-workspace” }
You must use the Azure Machine Learning SDK to interact with data and experiments in the workspace. You need to configure the config.json file to connect to the workspace from the Python environment. Which two additional parameters must you add to the config.json file in order to connect to the workspace? Each correct answer presents part of the solution.
- Region
- Login
- Key
- Resource_group (CORRECT)
- Subscription_id (CORRECT)
Correct: This parameter must be specified.
Correct: This parameter must be specified.
2. You are developing a data science workspace that uses an Azure Machine Learning service. You need to select a compute target to deploy the workspace. What should you use?
- Apache Spark for HDInsight
- Azure Databricks
- Azure Data Lake Analytics
- Azure Container Instances (CORRECT)
Correct: Azure Container Instances can be used as compute target for testing or development. Use for low-scale CPU-based workloads that require less than 48 GB of RAM.
3. A coworker registers a datastore in a Machine Learning services workspace by using the following code:
Datastore.register_azure_blob_container(workspace=ws,
datastore_name=‘demo_datastore’,
container_name=‘demo_datacontainer’,
account_name=’demo_account’,
account_key=’0A0A0A-0A00A0A-0A0A0A0A0A0’
create_if_not_exists=True)
You need to write code to access the datastore from a notebook. How should you complete the code segment?
- import azureml.core
- from azureml.core import Workspace, Datastore
- ws = Workspace.from_config()
- datastore = <add answer here> .get( <add answer here>, ‘<add answer here>’)
- Run, experiment, demo_datastore
- Experiment, run, demo_account
- Run, ws, demo_datastore
- DataStore, ws, demo_datastore (CORRECT)
Correct: To get a specific datastore registered in the current workspace, use the get() static method on the Datastore class, like this:
datastore = Datastore.get(ws, datastore_name=’your datastore name’)
4. A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file is stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:
/sales
/01-2019
/sales.csv
/02-2019
/sales.csv
/03-2019
/sales.csv
…
At the end of each month, a new folder with that month’s sales file is added to the sales folder. You plan to use the sales data to train a machine learning model based on the following requirements:
– You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.
– You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.
– You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in Azure Machine Learning service workspace. What should you do?
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
- (CORRECT)
- Create a new tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
- Create a tabular dataset that references the datastore and specifies the path ‘sales/*/sales.csv’, register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
Correct: This is one way to achieve the goal with dataset.
5. You create a deep learning model for image recognition on Azure Machine Learning service using GPU-based training.You must deploy the model to a context that allows for real-time GPU-based inferencing.
You need to configure compute resources for model inferencing. Which compute type should you use?
- Azure Kubernetes Service (CORRECT)
- Azure Container Instance
- Field Programmable Gate Array
- Machine Learning Compute
Correct: You can use Azure Machine Learning to deploy a GPU-enabled model as a web service. Deploying a model on Azure Kubernetes Service (AKS) is a viable option. The AKS cluster provides a GPU resource that is used by the model for inference.
6. You use Azure Machine Learning designer to create a real-time service endpoint. You have a single Azure Machine Learning service compute resource.
You train the model and prepare the real-time pipeline for deployment.
You need to publish the inference pipeline as a web service. Which compute type should you use?
- HDInsight
- Azure Kubernetes Services (CORRECT)
- A new Machine Learning Compute resource
- Azure Databricks
- The existing Machine Learning Compute resource
Correct: AKS always the solution for these types of problems.
7. You deploy a model as an Azure Machine Learning real-time web service using the following code.
# ws, model, inference_config, and deployment_config defined previously
service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)
service.wait_for_deployment(True)
The deployment fails.
You need to troubleshoot the deployment failure by determining the actions that were performed during deployment and identifying the specific action that failed.
Which code segment should you run?
- service.update_deployment_state()
- service.serialize()
- service.get_logs() (CORRECT)
- service.state
Correct: You can print out detailed Docker engine log messages from the service object.
You can view the log for ACI, AKS, and Local deployments.
8. You register a model that you plan to use in a batch inference pipeline.
The batch inference pipeline must use a ParallelRunStep step to process files in a file dataset. The script has the ParallelRunStep step and the runs must process six input files each time the inferencing function is called.
You need to configure the pipeline. Which configuration setting should you specify in the ParallelRunConfig object for the ParallelRunStep step?
- error_threshold= “6”
- node_count= “6”
- process_count_per_node= “6”
- mini_batch_size= “6” (CORRECT)
Correct: For FileDataset input, this field is the number of files a user script can process in one run() call. For TabularDataset input, this field is the approximate size of data the user script can process in one run() call.
9. Yes or No?
You train a classification model by using a logistic regression algorithm. You must be able to explain the model’s predictions by calculating the importance of each feature, both as an overall global relative importance value and as a measure of local importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local feature importance values.
Solution: Create a TabularExplainer. Does the solution meet the goal?
- Yes (CORRECT)
- No
Correct: The TabularExplainer supports both global and local feature importance explanations.
10. You deploy a real-time inference service for a trained model.
The deployed model supports a business-critical application, and it is important to be able to monitor the data submitted to the web service and the predictions the data generates.
You need to implement a monitoring solution for the deployed model using minimal administrative effort. What should you do?
- Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal. (CORRECT)
- View the log files generated by the experiment used to train the model.
- Create an ML Flow tracking URI that references the endpoint, and view the data logged by ML Flow.
- View the explanations for the registered model in Azure ML studio.
Correct: You can also enable Azure Application Insights from Azure Machine Learning studio.
11. You are a lead data scientist for a project that tracks the health and migration of birds. You create a multi-class image classification deep learning model that uses a set of labeled bird photographs collected by experts.
You have 100,000 photographs of birds. All photographs use the JPG format and are stored in an Azure blob container in an Azure subscription. You need to access the bird photograph files in the Azure blob container from the Azure Machine Learning service workspace that will be used for deep learning model training.
You must minimize data movement. What should you do?
- Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to the database.
- Create an Azure Data Lake store and move the bird photographs to the store.
- Register the Azure blob storage containing the bird photographs as a datastore in Azure Machine Learning service. (CORRECT)
- Create and register a dataset by using TabularDataset class that references the Azure blob storage containing bird photographs.
- Copy the bird photographs to the blob datastore that was created with your Azure Machine Learning service workspace.
Correct: We recommend creating a datastore for an Azure Blob container. When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace.
12. An organization creates and deploys a multi-class image classification deep learning model that uses a set of labeled photographs.
The software engineering team reports there is a heavy inferencing load for the prediction web services during the summer. The production web service for the model fails to meet demand despite having a fully-utilized compute cluster where the web service is deployed.
You need to improve performance of the image classification web service with minimal downtime and minimal administrative effort. What should you advise the IT Operations team to do?
- Increase the VM size of nodes in the compute cluster where the web service is deployed.
- Increase the node count of the compute cluster where the web service is deployed
- Increase the minimum node count of the compute cluster where the web service is deployed. (CORRECT)
- Create a new compute cluster by using larger VM sizes for the nodes, redeploy the web service to that cluster, and update the DNS registration for the service endpoint to point to the new cluster.
Correct: Rising in computer cluster when the service which is provided is deployed.
13. You use the Azure Machine Learning Python SDK to define a pipeline that consists of multiple steps.
When you run the pipeline, you observe that some steps do not run. The cached output from a previous run is used instead. You need to ensure that every step in the pipeline is run, even if the parameters and contents of the source directory have not changed since the previous run.
What are two possible ways to achieve this goal? Each correct answer presents a complete solution.
- Restart the compute cluster where the pipeline experiment is configured to run.
- Use a PipelineData object that references a datastore other than the default datastore.
- Set the allow_reuse property of each step in the pipeline to False.
- Set the outputs property of each step in the pipeline to True. (CORRECT)
- Set the regenerate_outputs property of the pipeline to True. (CORRECT)
Correct: Keep the following in mind when working with pipeline steps, input/output data, and step reuse. – If data used in a step is in a datastore and allow_reuse is True, then changes to the data change won’t be detected. If the data is uploaded as part of the snapshot (under the step’s source_directory), though this is not recommended, then the hash will change and will trigger a rerun
Correct: If regenerate_outputs is set to True, a new submit will always force generation of all step outputs, and disallow data reuse for any step of this run. Once this run is complete, however, subsequent runs may reuse the results of this run.
14. You train and register a model in your Azure Machine Learning workspace.
You must publish a pipeline that enables client applications to use the model for batch inferencing.
You must use a pipeline with a single ParallelRunStep step that runs a Python inferencing script to get predictions from the input data.
You need to create the inferencing script for the ParallelRunStep pipeline step. \
Which two functions should you include? Each correct answer presents part of the solution.
- batch()
- score(mini_batch)
- main()
- run(mini_batch) (CORRECT)
- init() (CORRECT)
Correct: This function is called for each batch of data to be processed.
Correct: This function is called when the pipeline is initialized.
15. An organization uses Azure Machine Learning service and wants to expand their use of machine learning. You have the following compute environments. The organization does not want to create another compute environment.
| Environment name | Compute Type |
|---|---|
| nb_server | Compute instance |
| aks_cluster | Azure Kubernetes Service |
| mlc_cluster | Machine Learning compute |
You need to determine which compute environment to use for the following scenarios:
1. Run an Azure Machine Learning Designer training pipeline.
2. Deploying a web service from the Azure Machine Learning Designer.
Which compute types should you use?
- 1 nb_server, 2 mlc_cluster
- 1 mlc_cluster, 2 nb_server
- 1 nb_server, 2 aks_cluster
- 1 mlc_cluster, 2 aks_cluster (CORRECT)
Correct: This question is dated. AKS is now a validation option for designer option.
16. You must be able to explain the model’s predictions by calculating the importance of each feature, both as an overall global relative importance value and as a measure of local importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local feature importance values.
Solution: Create a PFIExplainer. Does the solution meet the goal?
- Yes
- No (CORRECT)
Correct: The PFIExplainer doesn’t support local feature importance explanations.
17. You create an Azure Machine Learning compute resource to train models. The compute resource is configured as follows: – Minimum nodes: 2 – Maximum nodes: 4. You must decrease the minimum number of nodes and increase the maximum number of nodes to the following values: – Minimum nodes: 0 – Maximum nodes: 8
You need to reconfigure the compute resource. What are three possible ways to achieve this goal? Each correct answer presents a complete solution.
- Run the refresh_state() method of the BatchCompute class in the Python SDK.
- Use the Azure Machine Learning Designer.
- Use the Azure portal.
- Run the update method of the AmlCompute class in the Python SDK. (CORRECT)
- Use the Azure Machine Learning Studio. (CORRECT)
Correct: To change the nodes in the cluster, use the UI for your cluster in the Azure portal. (CORRECT)
Correct: The update(min_nodes=None, max_nodes=None, idle_seconds_before_scaledown=None) of the AmlCompute class updates the ScaleSettings for this AmlCompute target.
Correct: You can manage assets and resources in the Azure Machine Learning studio.
18. You create a new Azure subscription. No resources are provisioned in the subscription. You need to create an Azure Machine Learning workspace.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution.
- Navigate to Azure Machine Learning studio and create a workspace.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name, subscription_id, and resource_group parameters.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.create method with name, subscription_id, resource_group, and location parameters. (CORRECT)
- Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the az group create function with –name and –location parameters, and then the az ml workspace create function, specifying Cw and Cg parameters for the workspace name and resource group. (CORRECT)
- Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/ workspaces resource and its dependencies. (CORRECT)
Correct: This is one way to achieve the goal.
Correct: This is one way to achieve the goal.
Correct: This is one way to achieve the goal.
CONCLUSION – Exam Preparation Course 3
In conclusion, this module will provide a detailed review of Course 3 from the Microsoft Azure Data Scientist Associate specialization. By revisiting the key concepts, methodologies, and tools, you will solidify your understanding and enhance your practical skills in data science within the Azure ecosystem. This thorough review will ensure you are well-equipped to tackle real-world data science challenges using Microsoft’s comprehensive suite of Azure services and solutions.
Quiztudy Top Courses
Popular in Coursera
- Google Advanced Data Analytics
- Google Cybersecurity Professional Certificate
- Meta Marketing Analytics Professional Certificate
- Google Digital Marketing & E-commerce Professional Certificate
- Google UX Design Professional Certificate
- Meta Social Media Marketing Professional Certificate
- Google Project Management Professional Certificate
- Meta Front-End Developer Professional Certificate
Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!

