COURSE 2 – HARNESSING THE POWER OF DATA WITH POWER BI

Module 2: The Right Tools For the Job

MICROSOFT POWER BI DATA ANALYST PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Last updated:

INTRODUCTION – The right tools for the job

This module serves as an in-depth introduction for learners into the multifaceted world of data management. It begins by exploring the critical steps involved in data collection, emphasizing the significance of acquiring accurate and relevant information from diverse data sources. As learners delve deeper, they are introduced to the ETL process, a cornerstone of data integration, which involves extracting data from various origins, transforming it into a suitable format, and loading it into a destination system for analysis.

The module further highlights the importance of rigorously evaluating data before analysis to ensure its quality, reliability, and suitability for generating meaningful insights. By comprehensively covering these essential components, the module equips learners with a robust foundation in handling and preparing data for successful analytical endeavors.

Learning Objectives

  • Describe the basic concepts of how data is produced and gathered in businesses and in organizations
  • Explain the ETL process
  • Explain why transforming and cleaning data is required

SELF-REVIEW: DATASET

1. What is the primary task assigned to you by Renee in the Assessing a data set case study?

  • To sort and list the prices of all products, displaying them in descending order
  • To list the products in ascending order by the Supplier and Date Entered columns, respectively.
  • To list the products purchased from suppliers in ascending order by the Date Entered column.
  • To track the price increase on a supplier basis from the past to the present for each product the company purchases from suppliers and detect any unusual price situations, if any. (CORRECT)

That’s correct! The primary goal is to track supplier price increases from the past to the present for each product purchased and detect any abnormal price situations, if any. 

2. In the Assessing a data set case study, which of the following columns need to be considered to track the price increase on a supplier basis from the past to the present for each product and detect anomalies? Select all that apply.

  • Product name (CORRECT)
  • Supplier (CORRECT)
  • Category
  • Date Entered (CORRECT)

That’s correct! The ProductName column is required for this task.

Correct! In the current dataset, products are purchased from the same supplier. This may vary in the future, giving different price ranges on a supplier basis. Therefore, additional sorting on a supplier basis would be beneficial for the list.

Correct! The DateEntered column will be used to track the list from past to present.

3. Based on your analysis of the data in the Assessing a data set case study, is this dataset suitable for the business need as determined by the task assigned to you by Renee?

  • No
  • Yes (CORRECT)

Correct! The dataset contains the required fields for the business need.

4. What type of data is primarily collected, stored, and interpreted by Adventure Works’ Enterprise Resource Planning (ERP) system?

  • Photo and video data
  • Unstructured data
  • Structured data (CORRECT)
  • Semi-structured data

That’s correct! ERP systems are designed to collect, store, manage, and interpret structured data from various business activities. Structured data is data that is organized into a formatted repository, typically a database, so it’s easily searchable.

5. Which of the following is the SaaS (Software as a Service) based web application of Microsoft Power BI?

  • Power BI Service (CORRECT)
  • Power BI Apps
  • Power BI Desktop

That’s correct! Power BI Service is the web application, which is a SaaS (Software as a Service) service used by report users and administrators.

KNOWLEDGE CHECK: DATA COLLECTION

1. Which product has strong reporting features and is typically used to begin a workflow in Power BI?

  • Microsoft Power BI Service
  • Microsoft Power BI Apps
  • Microsoft Power BI Desktop (CORRECT)

That’s correct! Microsoft Power BI Desktop is a Windows-based desktop application that is mainly used to design and create reports.

2. If you are given Microsoft Excel data and informed of a business need, what method would you use to determine if the data provided is compatible with the business need?

  • Check if the Microsoft Excel data fulfills the business requirements by examining the format of the data.
  • Check if the Microsoft Excel data fulfills the business requirements by examining its content and data types. (CORRECT)
  •  Check if the Microsoft Excel data fulfills the business requirements by examining the source of the data.

That’s correct! Microsoft Excel data columns and business requirement details have to match in order to fulfill the requirement.

3. You want to publish your report and share your data with others by creating dashboards. Which of the following products would you use to accomplish this?

  • Microsoft Power BI Service (CORRECT)
  • Microsoft Power BI Desktop
  • Microsoft Power BI Apps

That’s correct! You can publish your report and share your data with others by creating dashboards on Microsoft Power BI Service.

4. True or False: The typical workflow in Microsoft Power BI starts with the creation of a report in Power BI Desktop. 

  • True (CORRECT)
  • False

That’s correct! The Power BI Workflow usually starts with Power BI Desktop and continues with Power BI Service and Power BI Mobile Apps. This way, the entire cycle from report creation to publishing and consumption is completed.

5. What term is used to classify data such as word-processing files, images, video, and audio files?

  • Structured data
  • Semi-Structured data
  • Unstructured data (CORRECT)

That’s correct! Unstructured data examples are media files, word-processing files, images, video, and audio files.

SELF-REVIEW: DATA STORAGE AND MANAGEMENT

1. What is the main advantage of hybrid storage for Adventure Works?

  • It reduces the IT management overhead.
  • It is the most affordable storage solution.
  • It is suitable for storing only structured data.
  • It combines the benefits of on-premises and cloud-based storage solutions. (CORRECT)

That’s correct! Hybrid storage offers the flexibility and scalability of cloud storage, while maintaining control over sensitive information.

2. Which statement best describes structured data?

  • Data that is not organized in a predefined format, consisting of data types that do not fit neatly into rows and columns.
  • Data that is easily searchable and analyzable, consisting of data types that can be neatly arranged in rows and columns. (CORRECT)
  • Data that is stored on physical hardware located within the company’s premises.
  • Data that is stored on remote servers managed by a third-party provider.

That’s correct! Structured data is data that is organized in a specific format, making it easily searchable and analyzable. 

3. What data source mentioned in the case study is an example of unstructured data?

  • Financial data
  • Manufacturing data
  • Sales data
  • Social media and online reviews data (CORRECT)

That’s correct! These are examples of unstructured data, as they consist of data types that do not fit neatly into rows and columns, such as customer reviews, social media engagements, and multimedia content.

4. In the ETL process, which step involves retrieving raw data from different sources, such as databases and files?

  • Extract (CORRECT)
  • Load
  • Visualize
  • Transform

That’s correct! The Extract step involves retrieving raw data from different sources.

5. Which method of data ingestion is most suitable for gathering data from many Excel spreadsheets?

  • Database connections
  • Manual data entry
  • Web scraping
  • File-based ingestion (CORRECT)

That’s correct! File-based ingestion is most suitable as it involves importing data from various file formats, including Excel spreadsheets, and reading and parsing the file contents.

6. You need to consolidate data from multiple sources into a unified view. Which aspect of data management involves this task?

  • Data integration. (CORRECT)
  • Data governance.
  • Data quality.
  • Data retention and archiving.

That’s correct! Data integration is the process of consolidating different data from multiple sources, such as different departments or systems, into a unified view.

KNOWLEDGE CHECK: INTRODUCTION TO THE ETL PROCESS

1. What is a benefit of on-premises storage?

  • Full control over data and infrastructure. (CORRECT)
  • No need for physical hardware.
  • Easy scalability.
  • Reduced IT management overhead.

That’s correct! On-premises storage gives businesses full control over their data and infrastructure, which is suitable for businesses with strict security and compliance requirements or that store sensitive or mission-critical data.

2. What is the primary purpose of the Transform step in the ETL process?

  • To load the transformed data into the final storage system.
  • To clean, structure, and enrich the data to make it more suitable for analysis. (CORRECT)
  • To extract data from multiple sources.
  • To analyze and visualize the data.

That’s correct! The Transform step in the ETL process involves cleaning, structuring, and enriching the data to make it more suitable for analysis.

3. Which method of data ingestion allows real-time access to data but may require knowledge of database languages and complex configurations?

  • Data streaming
  • File-based ingestion
  • Web scraping (CORRECT)
  • Database connections

That’s correct! Database connections enable real-time access to data by directly connecting to a database management system and require knowledge of database languages and database schema and configurations to effectively query, retrieve, and manipulate data.

4. What does source data refer to? Select all that apply.

  • Pre-processed data used for analysis and decision-making.
  • Data that has been analyzed and refined for specific purposes.
  • Raw, unprocessed information collected, stored, and managed by an organization. (CORRECT)
  • The initial input used as the basis for further processing, transformation, and analysis. (CORRECT)

That’s correct! Source data refers to the raw, unprocessed information collected, stored, and managed by an organization. 

That’s correct! Source data is the initial input used as the basis for further processing, transformation, and analysis.

5. Which aspect of data management is primarily responsible for establishing clear policies and procedures for data handling throughout an organization?

  • Data archiving
  • Data quality
  • Data security
  • Data governance (CORRECT)

That’s correct! Data governance is the aspect of data management that establishes clear policies, procedures, and standards for data handling throughout an organization.

SELF-REVIEW: EVALUATING DATA FOR TRANSFORMATION

1. Based on the Adventure Works Inventory dataset, what is the RestockingFrequency for the product Kidz-K400?

  • 30 days
  • 45 days (CORRECT)
  • 60 days
  • 90 days

Correct! According to the Inventory dataset, the RestockingFrequency for the product Kidz-K400 is 45 days. This can be verified by referring to the dataset where ProductID 47, theKidz-K400, has a RestockingFrequency of 45 days.

2. In the Customer Feedback dataset, which ProductID received a feedback score of 3,5 on May 2023 23rd?

  • 51
  • 49 (CORRECT)
  • 52
  • 50

Correct! The product with ProductID 49 did receive a feedback score of 3,5 on May 23rd 2023. A score of 3,5 suggests a level of customer satisfaction that is higher than neutral but less than high satisfaction.

3. According to the Adventure Works Sales dataset, what is the total quantity of products sold on 2023-05-05?

  • 2
  • 4
  • 3 (CORRECT)
  • 1

Correct! The total quantity of products sold on 2023-05-05 is 3. This can be confirmed by cross-referencing the ProductID35, TransactionID35, and SalesAmount of 750 in the dataset.

4. What is the primary goal of data cleaning in the context of data analysis? 

  • Processing and aggregating data to extract meaningful information.
  • Identifying and correcting errors and inconsistencies in datasets. (CORRECT)
  • Altering the structure, format, or values of the data.
  • Implementing data validation rules in the source system.

That’s correct! Data cleaning is a crucial step in the data analysis process, as it ensures the accuracy, reliability, and quality of the data used for generating insights. 

5. How does Power Query promote a structured and repeatable approach to data preparation? 

  • By recording data transformation steps in the Applied Steps pane. By automatically generating data visualizations. (CORRECT)
  • By creating real-time dashboards.
  • By integrating with third-party applications.

That’s correct! Power Query promotes a structured and repeatable approach to data preparation by recording data transformation steps in the Applied Steps pane.

KNOWLEDGE CHECK: INTRODUCTION TO TRANSFORMING DATA

1. Which process involves altering the data’s structure, format, or values to make it more suitable for analysis?

  • Data validation
  • Data cleaning
  • Data aggregating
  • Data transformation (CORRECT)

That’s correct! Data transformation involves altering the structure, format, or values of the data to make it more suitable for analysis, such as aggregating data, converting data types, or normalizing values.

2. What is a primary advantage of cleaning data at the source?

  • Eliminating the need for data transformation.
  • Reducing the need for data documentation.
  • Ensuring future analyses have a clean and consistent foundation. (CORRECT)
  • Making it easier to import data into Microsoft Power BI.

That’s correct. Cleaning data at the source ensures that any future analyses using this data will have a clean and consistent foundation, saving time and effort in future analyses.

3. Which Excel function can be used to convert text strings into date formats?

  • MATCH()
  • SUMIF()
  • PROPER (CORRECT)
  • UPPER()

That’s correct! The PROPER function will only capitalize the first character of each word in a piece of text. This function requires only one argument, which is the location of the piece of text you would like the function to work on.

4. What is the primary function of Power Query in Microsoft’s Microsoft Power BI suite?

  • Data visualization.
  • Sharing and collaboration.
  • Creating advanced calculations.
  • Data connectivity and preparation. (CORRECT)

That’s correct! Power Query’s primary function is data connectivity and preparation. It enables users to connect to various data sources, clean and transform the data, and then load it into Microsoft Power BI data models for further analysis and visualization. This streamlines and automates the process of preparing data for analysis, making it easier for users to gain valuable insights from their data. 

5. Which data transformation functions are commonly performed in Power Query? Select all that apply.

  • Changing data types. (CORRECT)
  • Filling in missing values. (CORRECT)
  • Encrypting data.
  • Removing duplicates. (CORRECT)

That’s correct! Changing data types is a common data transformation function in Power Query. Converting data to appropriate data types is crucial for facilitating data analysis and visualization. Power Query can help to change data types, ensuring consistency across the dataset.

That’s correct! Filling in missing values is a common data transformation function in Power Query. Missing values can hinder accurate analysis and interpretation, as they may cause inconsistencies and errors in calculations or aggregations. Power Query helps to fill in these missing values, improving the quality and usability of the data.

That’s correct! Removing duplicates is a common data transformation function in Power Query. Duplicate entries can skew the results of an analysis, so it is essential to identify and remove duplicates to ensure the accuracy and reliability of any data-driven insights.

MODULE QUIZ: THE RIGHT TOOLS FOR THE JOB

1. What is the purpose of the transform stage in the ETL process?

  • Removing duplicates and refining the data. (CORRECT)
  • Combining different data sources. 
  • Loading transformed data into a data warehouse. 
  • Retrieving raw data from different sources. 

Correct! The transform stage is primarily focused on refining the data, which includes operations like removing duplicates, converting data types, and handling missing values.

2. Which of the following tasks are related to data ingestion in the ETL process? Select all that apply:

  • Obtaining data from various sources. (CORRECT)
  • Cleaning and formatting data for analysis.
  • Importing data for immediate use or storage in a database. (CORRECT)
  • Loading data into a target database or data warehouse.

Correct! Data ingestion involves obtaining data from various sources for immediate use or storage in a database.

Correct! Data ingestion is the process of importing data from various sources for immediate use or storage in a database.

3. What is a crucial factor to consider when estimating storage capacity for data storage?

  • The number of employees who analyze data.
  • The size of your organization. (CORRECT)
  • The operating system used
  • The time of day data is most frequently accessed 

Correct! The size of your organization is an important factor when determining how much storage capacity you need.

4. In the context of data analysis, which of the following statements are true about data cleaning and data transformation? Select all that apply.

  • Data cleaning involves identifying and correcting errors and inconsistencies in datasets. (CORRECT)
  • Data transformation is an ongoing process, while data cleaning is a one-time process.
  • Data transformation involves altering the data’s structure, format, or values to make it more suitable for analysis. (CORRECT)
  • Data cleaning is only done at the source.

Correct! Data cleaning focuses on identifying and correcting errors and inconsistencies in datasets, such as removing duplicate entries, filling in missing values, and fixing incorrect data types.

Correct! Data transformation focuses on altering the structure, format, or values of the data to make it more suitable for analysis

5. Which Excel feature can help you quickly spot errors, outliers, or patterns in your data by applying different formats to cells based on specific conditions?

  • Data validation
  • VLOOKUP()
  • Conditional formatting (CORRECT)
  • TRIM()

Correct! Conditional formatting enables you to apply different formats (such as colors, fonts, or icons) to cells based on specific conditions, helping you quickly spot errors, outliers, or patterns in your data.

6. True or False: Microsoft Power Query’s main purpose in the Microsoft Power BI suite is to generate data visualizations automatically.

  • True
  • False (CORRECT)

Correct! Microsoft Power Query’s main purpose in the Microsoft Power BI suite is to connect to multiple data sources, clean and transform data, and load it into Power BI data models

7. True or False: Having too many visualizations is a common issue in raw data that can hinder accurate analysis.

  • True
  • False (CORRECT)

Correct! Having too many visualizations is an issue related to report presentation, not the quality or structure of the raw data itself, which can hinder accurate analysis.

8. What is the main goal of identifying and evaluating required data for a business decision?

  • To gather as much data as possible.
  • To avoid using external data sources.
  • To understand the factors that influence the decision and collect relevant data. (CORRECT)
  • To focus only on structured data sources.

Correct! Identifying and evaluating required data helps you understand the factors that influence the decision and collect the most relevant data for the analysis.

9. What type of data source is an Enterprise Resource Planning (ERP) system classified as?

  • Unstructured data
  • Structured data (CORRECT)
  • Semi-Structured data
  • Streaming data

Correct! Structured Data sources like ERP systems are organized, rule-based, and suitable for reporting.

10. In a typical Microsoft Power BI workflow, what is the primary purpose of Microsoft Power BI Desktop?

  • To view and interact with reports.
  • To assign user permissions.
  • To design and create reports. (CORRECT)
  • To share dashboards with other users.

Correct! Microsoft Power BI Desktop is primarily used by data analysts or report designers to clean, transform, and load data, create a data model, design reports, and publish these reports.

11. In the context of the ETL process, what does the term data ingestion primarily refer to?

  • Obtaining and importing data from various sources. (CORRECT)
  • Converting raw data into insights.
  • Loading transformed data into a data warehouse.
  • Cleaning and formatting data.

Correct! In the ETL process, data ingestion primarily refers to obtaining and importing data from various sources for immediate use or storage in a database.

12 When estimating storage capacity for data storage, which factor is the least relevant?

  • The number of departments within the organization.
  • How long you need to store the data.
  • The time of day data is most frequently accessed (CORRECT)
  • The size of your organization.

Correct!  While access times can impact system performance and might influence when to schedule intensive tasks, they don’t directly impact how much storage capacity you need.

13. What task is primarily involved in the process of data cleaning in the context of data analysis?

  • Transforming the data to be more suitable for Microsoft Power BI.
  • Removing duplicate entries from datasets.
  • Identifying and correcting errors and inconsistencies in datasets. (CORRECT)
  • Altering the structure, format, or values of the data.

Correct! Data cleaning involves identifying and correcting errors and inconsistencies in datasets, while data transformation involves altering the data’s structure, format, or values to make it more suitable for analysis.

14. Which of the following are main purposes of Microsoft Power Query in the Microsoft Power BI suite? Select all that apply.

  • To clean and transform data. (CORRECT)
  • To load data into Microsoft Power BI data models. (CORRECT)
  • To connect to multiple data sources. (CORRECT)
  • To create real-time dashboards.

Correct! Microsoft Power Query provides various data cleaning and transformation capabilities, making it easier to prepare data for analysis and visualization.

Correct! Microsoft Power Query enables you to load cleaned and transformed data into Power BI data models for further analysis and visualization.

Correct! Microsoft Power Query is designed to connect to various data sources, allowing you to import data into the Power BI data models.

15. Which of the following are main goals of identifying and evaluating required data for a business decision? Select all that apply:

  • To understand the factors that influence the decision and collect relevant data. (CORRECT)
  • To gather as much data as possible.
  • To consider both internal and external data sources. (CORRECT)
  • To consider all types of data sources, including structured, semi-structured, and unstructured data. (CORRECT)

Correct! Identifying and evaluating required data helps you understand the factors that influence the decision and collect the most relevant data for the analysis.

Correct! Identifying and evaluating required data often involves considering both internal and external data sources to gain the most comprehensive understanding of the factors influencing a decision.

Correct! Identifying and evaluating required data involves considering all types of data sources to ensure a comprehensive understanding of the factors influencing a decision.

16. Which of the following data sources are classified as structured data sources? Select all that apply.

  • Log files
  • Relational databases (CORRECT)
  • Messages
  • Enterprise Resource Planning (ERP) system (CORRECT)

Correct! Relational databases are examples of Structured Data sources, as they are organized and rule-based.

Correct! Structured Data sources like ERP systems are organized, rule-based, and suitable for reporting.

17. Which of the following tasks are performed during the transform stage of the ETL process? Select all that apply:

  • Loading transformed data into a data warehouse.
  • Converting data types. (CORRECT)
  • Handling missing values. (CORRECT)
  • Retrieving raw data from different sources.

Correct! Converting data types is one of the tasks performed during the transform stage of the ETL process.

Correct! Handling missing values is one of the tasks performed during the transform stage of the ETL process.

18. Which of the following factors should be considered when estimating storage capacity for data storage? Select all that apply.

  • The size of your organization. (CORRECT)
  • The color of the storage devices.
  • How long you need to store the data. (CORRECT)
  • The type of data you collect. (CORRECT)

Correct! The size of your organization is an important factor when determining how much storage capacity you need.

Correct! How long you need to store the data is an important factor when estimating storage capacity for data storage.

Correct! The type of data you collect is a crucial factor when estimating storage capacity for data storage.

19. Which of the following Microsoft Excel features can help you quickly spot errors, outliers, or patterns in your data by applying different formats to cells based on specific conditions? 

  • Data validation
  • TRIM()
  • VLOOKUP()
  • Conditional formatting (CORRECT)

20. Which of the following are common issues in raw data that can hinder accurate analysis? Select all that apply:

  • Too few data points. (CORRECT)
  • Too many visualizations.
  • Incompatible data sources. (CORRECT)
  • Missing values. (CORRECT)

Correct! Having too few data points can limit the insights you can draw from the data, which can hinder accurate analysis.

Correct! Incompatible data sources can be a challenge when importing data, leading to issues in the raw data that can hinder accurate analysis.

Correct! Missing values in raw data can hinder accurate analysis and interpretation, as they may cause inconsistencies and errors in calculations or aggregations.

21. True or False: The main goal of identifying and evaluating required data for a business decision is to gather as much data as possible.

  • True
  • False (CORRECT)

22. True or False: An Enterprise Resource Planning (ERP) system is classified as an Unstructured data source.

  • True
  • False (CORRECT)

23. Which of the following are primary purposes of Microsoft Power BI Desktop? Select all that apply.

  • To clean, transform, and load data. (CORRECT)
  • To view and interact with reports.
  • To design and create reports. (CORRECT)
  • To assign user permissions.

Correct! Microsoft Power BI Desktop is used to clean, transform, and load data, as well as to create data models and design reports.

Correct! Microsoft Power BI Desktop is primarily used by data analysts or report designers to clean, transform, and load data, create a data model, design reports, and publish these reports.

24. What is a common issue in raw data that can hinder accurate analysis?

  • Too few data points.
  • Too many visualizations.
  • Incompatible data sources.
  • Missing values. (CORRECT)

25. Which stage of the ETL process is responsible for loading the transformed data into a data warehouse or another storage system for analysis?

  • Load (CORRECT)
  • Extract
  • Visualize
  • Transform

Correct! The loadstage in the ETL process is about loading the transformed data into a data warehouse or another storage system where it can be accessed and analyzed by various tools like Power BI.

26. What is the primary difference between data cleaning and data transformation in the context of data analysis?

  • Data cleaning focuses on removing duplicate entries, while data transformation focuses on altering data structure.
  • Data cleaning is a one-time process, while data transformation is an ongoing process.
  • Data cleaning is only done at the source, while data transformation is only done in Microsoft Power BI.
  • Data cleaning involves identifying and correcting errors and inconsistencies in datasets, while data transformation involves altering the data’s structure, format, or values. (CORRECT)

Correct! Data cleaning involves identifying and correcting errors and inconsistencies in datasets, while data transformation involves altering the data’s structure, format, or values to make it more suitable for analysis.

27. True or False: The primary purpose of Microsoft Power BI Desktop is to share dashboards with other users.

  • True
  • False (CORRECT)

Correct! The primary purpose of Microsoft Power BI Desktop is to clean, transform, and load data, create a data model, design and publish reports. Sharing dashboards is done within Power BI Service.

28. What is the primary purpose of data ingestion in the ETL process?

  • To clean and format data for analysis.
  • To load data into a target database or data warehouse.
  • To obtain and import data from various sources for immediate use or storage in a database. (CORRECT)
  • To analyze data and extract insights.

Correct! Data ingestion is a crucial ETL step bringing data together from different sources and formats, enabling fully informed analysis and decision-making.

29. Which Microsoft Excel feature enables setting criteria for allowable data in a cell or range of cells?

  • VLOOKUP()
  • Data validation (CORRECT)
  • TRIM()
  • Conditional formatting

Correct! Data Validation in Excel allows you to set criteria for the allowable data in a cell or range of cells. Data validation can help you prevent errors and inconsistencies in your data.

CONCLUSION – The right tools for the job

In conclusion, this module provides learners with a comprehensive foundation in the critical aspects of data management, from the initial stages of data collection and identifying valuable data sources, through the intricacies of the ETL process, to the essential practice of evaluating data for analysis. By mastering these fundamental skills, learners are well-prepared to handle and prepare data effectively, ensuring its quality and reliability for generating insightful and meaningful analytical outcomes. This thorough understanding equips learners to navigate the complex landscape of data with confidence and competence, paving the way for successful analytical endeavors and informed decision-making.