COURSE 3 – EXTRACT, TRANSFORM AND LOAD DATA IN POWER BI

Module 3: Advanced ETL in Power BI

MICROSOFT POWER BI DATA ANALYST PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Last updated:

INTRODUCTION – Advanced ETL in Power BI

In this module, you will develop a thorough understanding of the load portion of the Extract, Transform, Load (ETL) process, which is a critical phase in data integration and management. You will delve into the intricacies of how data is loaded into various storage systems, ensuring it is accessible and ready for analysis.

Additionally, this module will provide you with practical experience in data profiling, allowing you to assess the quality, structure, and content of your data effectively. Alongside this, you will practice advanced querying techniques, enabling you to retrieve and manipulate data efficiently to meet specific analytical needs. By mastering these concepts and skills, you will be well-prepared to handle complex data tasks in real-world scenarios, ensuring data is well-organized and primed for subsequent analysis and reporting activities.

Learning Objectives

  • Define the “load” portion of ETL.
  • Describe how Power BI is used to load data at a high level.
  • Identify data anomalies using profiling tools.
  • Use profiling tools to identify data anomalies.
  • Explain best practices and advanced query capabilities.

KNOWLEDGE CHECK: LOADING DATA WITH POWER BI

1. Which step in the ETL process includes sending the transformed data into the target database?

  • Transform
  • Extract
  • Load (CORRECT)

That’s correct! The final step, Load, includes sending the transformed data into the target database.

2. Which of the following options can describe staging? Select all that apply.

  • Used primarily in Exploring Relationships.
  • Simplifies the process of data cleansing and consolidation for operational data. (CORRECT)
  • Allows raw data to be stored and prepared for further processing in a data pipeline. (CORRECT)
  • Acts a bridge between the data sources and the data warehouse. (CORRECT)

That’s correct! A staging area simplifies the process of data cleansing and consolidation for operational data originating from multiple source systems, particularly for enterprise data warehouses that centralize an organization’s critical data.

That’s correct! The staging area allows raw data to be temporarily stored and prepared for further processing in a data pipeline.

That’s correct! The staging area typically acts as a bridge between the data sources and the data warehouse.

3. True or False: The existence of a data staging area is not obligatory for your ETL jobs.

  • True (CORRECT)
  • False

That’s correct! The existence of a data staging area is not obligatory for your ETL jobs. However, it is recommended to simplify the data cleansing process and consolidate data coming from multiple sources.

4. You import two Microsoft Excel tables named SalesData2020 andSalesData2021 into Power Query.

Both tables have the same column names:

  • SalesDate
  • ProductKey
  • ResellerKey
  • EmployeeKey
  • Quantity
  • Discount
  • Unit Price
  • Sales

Your manager asked you to transform these Excel tables, then append them but not use them in the data warehouse, just keep them as source tables. What should you do to accomplish this task? Select all that apply.

  • Import SalesData2020 and SalesData2021 tables. (CORRECT)
  • Right-click the tables and disable the load by clearing the checkbox Enable load. (CORRECT)
  • Load data of SalesData2020 and SalesData2021 tables.
  • Right-click the Queries pane at the left menu of Power Query, select New Group and create a group. Then move the tables to the newly created group. (CORRECT)

That’s correct! You must import SalesData2020 and SalesData2021 tables as the first step.

That’s correct! In this way, they will still be used in queries, but you will not bring them into the data model.

That’s correct! You will create one table of Sales History out of these two tables. But still, you need to keep these two queries. So, you are going to create a new group for them and organize your tables according to your need.

5. When the Power Query window is opened and the transformations are completed, you are ready to load the tables into the reporting area. What should you do to complete this task?

  • Close
  • Apply
  • Close & Apply (CORRECT)

That’s correct! It closes the Power Query window by applying the changes to the data model

6. Which option must be selected to load data directly into the reporting area when the Get Data window is opened?

  • Select the correct option.
  • Load (CORRECT)
  • Transform Data
  • Cancel

That’s correct! The Load button directly loads the data from the data source.

7. Which of these issues can be considered as an advantage of using staging in Power BI? 

Select all that apply.

  • Consolidate data easier. (CORRECT)
  • Manage data effectively. (CORRECT)
  • Identify errors in datasets.
  • Combine datasets.

That’s correct! A Staging area makes data cleansing and consolidation from various systems easier.

That’s correct! By organizing your tables and queries, you manage your data source and their queries effectively.

SELF-REVIEW: PROFILING A DATASET

1. Which of the following operation gives you the percentage of valid, error, or empty values found in columns?

  • Column quality (CORRECT)
  • Column profile
  • Column distribution

That’s correct! Column quality gives you the percentage of valid, error, or empty values found in columns

2. What is the empty percentage of ProductKey column rows?

  • 13
  • 11 (CORRECT)
  • 16

That’s correct! 11% of the ProductKey column row values are empty. You can verify that, by checking the Column Profile checkbox on the View ribbon tab, from inside the Data Preview group when the ProductKey column is selected.

3. How many unique and distinct values are there in the Price column?

  • 18 distinct, 17 unique (CORRECT)
  • 18 distinct, 18 unique
  • 17 distinct, 17 unique

That’s correct! There are 18 distinct, 17 unique values for the Price column, on the View ribbon tab, from inside the Data Preview group, when you check the Column Profile checkbox.

4. Which column characteristic gives the most frequently repeated value in selected records? Select the correct option.

  • Mode (CORRECT)
  • Average
  • Min

That’s correct! Mode is the one that most frequently repeated value in a record set. It is primarily used in statistics to benchmark with the average.

5. Which menu item gives the distinct and unique row values amounts for a selected column? 

  • Column Distribution (CORRECT)
  • Column Profile
  • Column Quality

That’s correct! Column distribution adds distinct and unique amounts below validity information.

KNOWLEDGE CHECK: DATA PROFILING IN POWER BI

1. Which of the following Power Query menu items provides the user with column information like the number of empty and distinct rows and rows with errors?  

  • Column Profile (CORRECT)
  • Column Distribution
  • Column Quality

That’s correct! Column profile gives you value distribution on the selected column and statistics such as Minimum, Maximum, Average (Mean), Frequently Occurring Values (Mode), and Standard Deviation for the row values.

2. Which of the following is defined as a data point that significantly deviates from other observations?

  • Anomaly
  • Outlier (CORRECT)
  • Standard Deviation

That’s correct! The formal definition of an outlier in statistics is a data point that significantly deviates from other observations. An outlier refers to an individual data point or a group of data points that deviates significantly from the remaining data set. On the other hand, an anomaly represents a single point or a group of points that exhibit considerable distance from other points in the multi-dimensional feature space.

3. True or False: Distinct is known as “total number of different values”, regardless of how many of each we have. Unique is known as “total number of values that only appear once”. In this case, for fields with Primary Key or Unique Constraint defined, the values of Unique and Distinct will be equal.

  • True (CORRECT)
  • False

That’s correct! Distinct is known as “total number of different values”, and Unique is known as “total number of values that only appear once”.

4. You import an Excel table named EmployeeData2023 into Power Query. You removed all other columns except Country.

The Country column has the following 10 row values:

  • USA
  • France
  • France
  • Ireland
  • England
  • England
  • USA
  • USA
  • Spain
  • France

What are the unique and distinct values of this column?

  • 3 unique and 7 distinct
  • 2 unique and 8 distinct
  • 1 unique and 5 distinct
  • 2 unique and 5 distinct (CORRECT)

That’s correct! Spain and Ireland appear only once in the rows, so the unique amount is 2. Distinct values are the USA, France, Ireland, England and Spain. So, there are 5 distinct values.

5. You need to identify if data in a column contains Empty values. Which of the following can be used to quickly identify this? Select all that apply.

  • Column Distribution
  • Column Quality (CORRECT)
  • Column Profile (CORRECT)

That’s correct! Column quality gives valid, error and empty rows percentages in each column.

That’s correct! Column profile gives you statistics including empty row amounts.

KNOWLEDGE CHECK: ADVANCED QUERIES

1. Which of the following statements is true about Power BI dataflows?

  • Dataflows allow advanced coding for complex transformations.
  • Dataflows can only connect to cloud-based data sources.
  • Dataflows are only available in Power BI Desktop.
  • Dataflows enable self-service data preparation and integration. (CORRECT)

That’s correct! Power BI dataflows empower business users to perform self-service data preparation and integration tasks

2. Which of the following join types is selected when merging or joining multiple queries as the most efficient merge strategy?

  • Inner join (CORRECT)
  • Full outer join
  • Left outer join

That’s correct! The most efficient merge strategy when merging or joining multiple queries is the inner join type.

3. True or False: You prioritize and schedule resource-intensive operations towards the end of the data transformation process. This approach ensures that complex calculations, merging large datasets, and applying multiple transformations on a significant number of rows are executed efficiently, leading to faster data loading and more responsive reports.

  • True (CORRECT)
  • False

That’s correct! One key principle to keep in mind is to “Do expensive operations last.” This means that you should prioritize and schedule resource-intensive operations towards the end of your data transformation process.

4. You are working on a data analysis project for Adventure Works. The company provides you with a large dataset containing information about sales, production, and inventory. Your task is to perform complex data transformations and calculations to derive meaningful insights from the data.

  • In this scenario, how can the Advanced Editor in Power BI be beneficial for your data analysis process?
  • It allows you to import and connect to various data sources to consolidate the dataset.
  • It provides advanced scripting capabilities for customizing data transformations. (CORRECT)
  • It offers a wide range of visualizations to present the analysis results.

That’s correct! The Advanced Editor in Power BI provides advanced scripting capabilities for customizing data transformations. It allows you to write custom formulas, apply complex data manipulation logic, and perform intricate calculations on your dataset. This powerful feature gives you greater control and flexibility in shaping your data according to specific business requirements, enabling you to derive more accurate and meaningful insights. 

5. Which of the following terms can be considered as the benefits of Reference Queries? Select all that apply:

  • Reusability (CORRECT)
  • Efficiency (CORRECT)
  • Scalability  (CORRECT)
  • Connectivity

That’s correct! By referencing queries, you can reuse common data transformations across multiple queries. 

That’s correct! Referencing queries eliminates the need to repeat time-consuming data transformation steps. 

That’s correct! As your data analysis requirements grow, referencing queries allows you to build modular and scalable data transformation workflows. 

6. Which one of the following options is considered a best practice when working with Power Query?

  • Performing expensive operations at the end of the process. (CORRECT)
  • Choosing the most visually appealing color palette for your reports.
  • Keeping all data in a single table for simplicity.

That’s correct! By following the “Do expensive operations last” principle, you can optimize the performance and efficiency of your data transformations in Power BI.

7. What is the primary purpose of dataflows in Power BI?

  • To centralize and manage data preparation processes. (CORRECT)
  • To create interactive visualizations for business insights.
  • To visualize and analyze data in Power BI reports.

That’s correct! Dataflows in Power BI serve as a centralized platform for organizing and managing data preparation tasks.

8. Referencing queries eliminates the need to repeat time-consuming data transformation steps. 

Which of the following benefits of using query references fits this definition? 

  • Efficiency (CORRECT)
  • Reusability
  • Scalability

That’s correct! Referencing queries eliminates the need to repeat time consuming data transformation steps.

9. Which of the following impacts must be considered when choosing to use a Reference Query? 

Select all that apply.

  • Frequency and timing of refreshes. (CORRECT)
  • Refreshing referenced queries. (CORRECT)
  • Ease of maintenance. (CORRECT)
  • Centralization of data.

That’s correct! How often and when refreshes occur can have a significant impact on data sources.

That’s correct! Queries that are referenced by the reference query must be refreshed to maintain data consistency.

That’s correct! When introducing multiple reference queries and layers, the time required to maintain these will increase.

10. What is the primary purpose of using query parameters in Power BI?

  • To create adaptable, reusable queries for efficient data retrieval and transformation. (CORRECT)
  • To change the visual design of the reports.
  • To connect Power BI with different databases.
  • To promote ease of query duplication for different variables.

That’s correct! Query parameters in Power BI allow you to create adaptable, reusable queries that enhance data retrieval and transformation.

11. What are the functionalities of the Advanced Editor in Microsoft Power Query?

  • To write, modify, and debug M code. (CORRECT)
  • To perform data transformations, calculations, and aggregations. (CORRECT)
  • To design data models and create visualizations for analysis.
  • To connect to external data sources. 

That’s correct! You can read and modify M language directly in the Advanced Editor. It provides syntax highlighting, auto-completion, and error-checking features, making it easier to write and debug your M code. 

That’s correct! Using the capabilities of the M language, the Advanced Editor offers functions and operators that allow you to perform various data transformations, calculations, and aggregations.

MODULE QUIZ: ADVANCED ETL IN POWERBI

1. The ________ provides syntax highlighting, auto-completion, and error-checking features, making it easier to write and debug your ________.

  • Advanced Editor, M code (CORRECT)
  • Advanced Editor, DAX
  • Transform Data, DAX 

That’s correct! The Advanced Editor provides syntax highlighting, auto-completion, and error-checking features making it easier to write and debug your M code.

2. ____ helps you to gain a comprehensive understanding of data quality, structure, and distribution. 

  • Extracting data 
  • Transforming data 
  • Profiling data  (CORRECT)

That’s correct!  The data profiling tool identifies potential issues and anomalies within the dataset. It does this by examining aspects of the data such as completeness, accuracy, uniqueness, and consistency,

3. If you need fine-grained control over data transformations or want to build a modular data model, ________ is the preferred option. On the other hand, if you want a scalable and user-friendly approach to data preparation, ________ offer a more intuitive experience.

  • referencing queries,  dataflows (CORRECT)
  • dataflows, referencing queries 
  • referencing queries, merge queries

That’s correct! If you need fine-grained control over data transformations or want to build a modular data model, referencing queries is the preferred option. On the other hand, if you want a scalable and user-friendly approach to data preparation, dataflows offer a more intuitive experience.

4. True or False: Power Query provides automatic data type detection, but it is advisable to review and adjust the inferred data types manually. This prevents incorrect data interpretations and reduces memory consumption.

  • True (CORRECT)
  • False

That’s correct! Power Query scans the first 1000 rows to detect the type of data in the columns, but you can review and adjust the data types when needed.

5. True or False: Power BI dataflows can be used to fetch data from real-time data sources.

  • True
  • False (CORRECT)

6. Using ________ you can perform advanced data manipulation tasks such as conditional filtering, custom column creation, data type conversions, and merging multiple data sources.

  • M Language (CORRECT)
  • Reference Queries
  • Dataflows

That’s correct! Using M Language, you can perform advanced data manipulation tasks such as conditional filtering, custom column creation, data type conversions, and merging multiple data sources.

7. True or False: By performing expensive operations such as sorting, grouping, or aggregating at the beginning of the data transformation pipeline, Power BI can optimize query execution and minimize the amount of data that needs to be processed during analysis, resulting in faster query response times.

  • True
  • False (CORRECT)

That’s correct!  To optimize performance and streamline your data transformation workflow, it is recommended to prioritize and schedule resource-intensive operations, such as sorting, grouping, or aggregating, towards the end of the process. By doing so, you can leverage the results of previous transformations and avoid redundant computations, resulting in improved efficiency and reduced processing time.

8. Consider the volume and complexity of your data. Some ________ may perform better with large datasets or have optimizations for specific scenarios.

  • Transformations
  • Connectors (CORRECT)
  • loading options

That’s correct! When evaluating connectors, it’s important to take into account the size and complexity of your data, as certain connectors may offer improved performance for handling large datasets or specialized optimizations tailored to specific use cases.

9. Which of the following statements regarding error handling in Power BI is true?

  • Error handling in Power BI is not necessary as errors are automatically handled by the system.
  • Error handling in Power BI lets you display error messages and does not provide options for resolution.
  • Error handling in Power BI allows you to define custom actions when encountering errors during data transformation or query execution. (CORRECT)

That’s correct! Error handling in Power BI allows you to define custom actions when encountering errors during data transformation or query execution.

10. True or False: You may sometimes need to use M language to apply custom transformations which could not be created using only the Power Query user interface.

  • True  (CORRECT)
  • False 

That’s correct! You can apply custom transformations using the M language in Advanced Editor which can’t be accomplished using the Power Query user interface alone.

11. True or False: When loading data into Power BI for offline analysis, you should choose the Direct Query option.

  • True 
  • False  (CORRECT)

That’s correct! You choose Import, not Direct Query to load the data into Power BI for offline analysis.

12 How many rows does Power Query scan to detect the type of data in the columns?

  • 1000 (CORRECT)
  • 100
  • 10000

That’s correct! Power Query scans the first 1000 rows to detect the type of data in the columns.

13. ________ provide a centralized platform for businesses to manage and organize their data effectively

  • Advanced Editor and M Language
  • Dataflows (CORRECT)
  • Reference Queries

That’s correct. Dataflows provide a centralized platform for businesses to manage and organize their data effectively.

14. Using ________ you can perform advanced data manipulation tasks such as conditional filtering, custom column creation, data type conversions, and merging multiple data sources.

  • M Language (CORRECT)
  • Reference Queries
  • Dataflows

That’s correct! Using M Language, you can perform advanced data manipulation tasks such as conditional filtering, custom column creation, data type conversions, and merging multiple data sources.

15. True or False: When selecting a connector in Power BI, the choice should solely be based on the availability of the connector, regardless of the data source and specific requirements of the project.

  • True
  • False (CORRECT)

That’s correct! When choosing the right connector in Power BI, it is essential to consider the data source and specific requirements of the project. Not all connectors are suitable for every data source, and some connectors offer more optimized performance or advanced capabilities for specific scenarios. It is important to evaluate the compatibility, performance, and capabilities of each connector to ensure the best fit for your data source and project needs.

16. Use ____________ techniques such as conditional logic, and custom error messages to ensure smooth data processing.

  • performance consideration
  • error handling (CORRECT)
  • data validation

That’s correct! Use error-handling techniques such as conditional logic, and custom error messages to ensure smooth data processing.

17. Which query language do you use in Advanced Editor in Power Query?

  • T-SQL
  • M   (CORRECT)
  • DAX 

That’s correct! Using the M Language, you can perform advanced data manipulation tasks. These could include conditional filtering, custom column creation, data type conversions, and merging multiple data sources in Power Query.

18. Which storage mode leaves the data at the data source?

  • Dual
  • Import 
  • Direct Query  (CORRECT)

That’s correct! When you choose Direct Query as a storage mode, the data remains at the data source.

19. True or False: Data profiling empowers users to discover hidden insights, uncover data inconsistencies, and make data-driven decisions with confidence.

  • True (CORRECT)
  • False 

That’s correct!  By performing data profiling, users can identify patterns, trends, and anomalies within the dataset that may not be immediately apparent. It helps to uncover hidden insights and potential correlations that can provide valuable information for decision-making. 

20. You should prioritize and schedule resource-intensive operations ____________ your data transformation process.

  • at the beginning of
  • at the end of  (CORRECT)
  • during

That’s correct! You should prioritize and schedule resource-intensive operations at the end of your data transformation process.

21. You want to create a business report using Power BI that includes newly added data rows to a table. It is important for the data to be up to date, but you also need your report to run smoothly and efficiently. Which option would be the most appropriate to choose?

  • Dual Mode  (CORRECT)
  • Direct Query
  • Import

That’s correct! Loading data into the data model using dual storage mode includes both import and direct query modes. This setting allows the Power BI service to determine the most efficient mode to use on a query by query basis.

22. What are the key benefits of using Power BI dataflows in data analysis and reporting? 

  • Enhanced data cleansing and transformation features.
  • Real-time data processing and streaming capabilities.
  • Seamless integration with third-party data sources.  (CORRECT)

23. True or False: In Power BI, M language and Advanced Editor provides advanced functions and customization options for data transformation.

  • True  (CORRECT)
  • False

Feedback: That’s correct! M language is the formula language of Power Query, and it gives you more control over your data transformation steps. The Advanced Editor tool enables you to perform M language-based data transformation in a more detailed manner. This allows you to perform operations such as data retrieval, cleansing, transformation, and merging from data sources in a more flexible way.

24. Why is it important to consider authentication in Power BI connectors?

  • Authentication ensures the accuracy and integrity of data.
  • Authentication prevents unauthorized access to sensitive information.  (CORRECT)
  • Authentication improves the performance of data processing.

That’s correct! Authentication prevents unauthorized access to sensitive information. Power BI offers various authentication methods, such as Windows, Azure Active Directory, OAuth, and others.

CONCLUSION – Advanced ETL in Power BI

In conclusion, this module equips you with a deep understanding of the load portion of the ETL process and provides you with hands-on experience in data profiling and advanced querying. By mastering these critical components, you will be adept at ensuring data is properly loaded, well-structured, and of high quality, ready for in-depth analysis. These skills are essential for effective data management and will prepare you to handle complex data tasks with confidence, setting a solid foundation for advanced analytical and reporting activities in future coursework and professional endeavors.