COURSE 2: THE PATH TO INSIGHTS: DATA MODELS AND PIPELINES

Module 2: Dynamic Database Design

GOOGLE BUSINESS INTELLIGENCE PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Dynamic Database Design

In this advanced course, the exploration into database systems deepens, offering a comprehensive understanding of essential components such as data marts, data lakes, data warehouses, and ETL processes. As learners delve into the intricacies of these systems, they gain insights into how each contributes to the overall architecture, storage, and retrieval of data within a business intelligence framework. The course goes beyond the basics, providing a nuanced perspective on the strategic use of data structures to optimize performance and facilitate efficient decision-making processes.

A significant focus of the course lies in unpacking the five critical factors that influence database performance: workload, throughput, resources, optimization, and contention. By comprehensively examining these factors, participants gain the tools to assess, troubleshoot, and optimize database systems effectively. Understanding how these elements interplay is crucial for BI professionals seeking to enhance the efficiency and responsiveness of their data infrastructure.

Learning Objectives

  • Discover strategies to create an ETL process that works to meet organizational and stakeholder needs and maintain an ETL process efficiently.
  • Understand what the different data storage and extraction processes and tools may include (Extract/L: Stitch/Segment/Fivetran, Transform: DBT/Airflow/Looker).
  • Explain how to optimize when building new tables.
  • Identify and describe where new tables can fit in the pipeline.
  • Recognize the different aspects of databases, including OLAP and OLTP, columnar and relational, distributed and single-homed databases.
  • Understand the importance of database performance and optimization.
  • Describe the different five factors of database performance: workload, throughput, resources, optimization, and contention.
  • Perform pipeline debugging using queries.

PRACTICE QUIZ: DATABASE PERFORMANCE

1. Fill in the blank: A data mart is a _____ database that can be a subset of a larger data warehouse. This means it is a convenient way to access the data pertaining to specific areas or departments of a business.

  • Specialized
  • Categorical
  • Subject-oriented (CORRECT)
  • Departmental

Correct: A data mart is a subject-oriented database that can be a subset of a larger data warehouse. This means it is a convenient way to access the data pertaining to specific areas or departments of a business.

2. A business intelligence team manager wants to support their team’s ability to perform at a high level. They investigate the overall capability of their company’s database hardware and software tools to enable the team to process stakeholder requests. In this situation, which of the factors of database performance do they consider?

  • Workload
  • Resources
  • Throughput (CORRECT)
  • Optimization

Correct: They consider throughput. Throughput describes the overall capability of the database’s hardware and software to process requests.

3. What term is used to describe data that is broken up into many pieces that are not stored together?

  • Split data
  • Modified data
  • Archived data
  • Fragmented data (CORRECT)

Correct: Fragmentation most often occurs when the data is used frequently, when new data files are created, or when existing data files are modified or deleted.

QUIZ: MODULE 2 CHALLENGE

1. Which of the following statements accurately describe data marts and data lakes? Select all that apply.

  • Data lakes are subject-oriented, which means they are associated with specific areas or departments of a business.
  • Data marts are designed to enable information accessibility because their data doesn’t require a lot of processing.
  • Data lakes are designed to enable information accessibility because their data doesn’t require a lot of processing. (CORRECT)
  • Data marts are subject-oriented, which means they are associated with specific areas or departments of a business. (CORRECT)

2. Fill in the blank: A business intelligence professional gathers data, loads it into a unified destination system, and then transforms it into a useful format. They do this using an _____ data pipeline.

  • oriented
  • ELT (CORRECT)
  • Interpreted
  • ETL

3. What is a measure of the workload that can be processed by a database, as well as the associated costs?

  • Scalability
  • Maturity
  • Database performance (CORRECT)
  • Distribution

4. A business intelligence professional is considering the transactions, queries, analyses, and system commands being processed by a database system. Which of the five factors of database performance are they evaluating?

  • Workload (CORRECT)
  • Throughput
  • Optimization
  • Contention

5. Which of the following statements accurately describe database resources? Select all that apply.

  • Resources may not be shared with other users.
  • Resources include disk space and memory. (CORRECT)
  • Resources include hardware and software tools. (CORRECT)
  • Resources can be both internal and external. (CORRECT)

6. Optimization involves decreasing _____, which is how long it takes for a database to respond to a user request.

  • Scope
  • Data view
  • Contention
  • Response time (CORRECT)

7. Fill in the blank: In a relational database system that uses SQL, a _____ describes how the database system will execute a query.

  • query plan (CORRECT)
  • run method
  • HOW statement
  • data limitation

8. Fill in the blank: A business intelligence team uses _____ to divide their cloud database system into logical parts. This helps improve query processing and manageability.

  • the SPLIT function
  • data partitioning (CORRECT)
  • database migration
  • metadata

9. Fragmented data is broken up into many pieces that are not stored together. What are some common reasons for this fragmentation? Select all that apply.

  • Using the data infrequently
  • Modifying data files (CORRECT)
  • Deleting data files (CORRECT)
  • Creating new data files (CORRECT)

10. When two or more data analysts attempt to use a single data resource in a conflicting way, what is the result?

  • Redundancy
  • Duplicates
  • Contention (CORRECT)
  • Argument

11. What business intelligence tool enables data to be gathered from different sources, then loaded into a unified destination system and transformed into a useful format?

  • Data lake
  • ELT (CORRECT)
  • ETL
  • Data mart

12. Fill in the blank: Database performance is a measure of the workload that can be _____ by a database, as well as the associated costs.

  • Measured
  • Processed (CORRECT)
  • stored
  • visualized

13. Which of the following statements accurately describes workload with regards to database performance?

  • Workload involves maximizing the speed and efficiency with which data is retrieved in order to ensure high levels of database performance.
  • Workload is the combination of transactions, queries, analysis, and system commands being processed by the database system at any given time. (CORRECT)
  • Workload is the overall capability of the database’s hardware and software to process requests.
  • Workload involves two or more components attempting to use a single resource in a conflicting way.

14. Which of the following statements accurately describe database resources? Select all that apply.

  • Resources do not fluctuate.
  • Only internal factors affect resource performance
  • External factors can affect resource performance. (CORRECT)
  • Resources can fluctuate. (CORRECT)

15. A business intelligence team is optimizing the performance of their database. What does this involve? Select all that apply.

  • Evaluating the effectiveness of the team’s spreadsheets
  • Examining resource use (CORRECT)
  • Identifying better data sources and structures (CORRECT)
  • Comparing workload to cost (CORRECT)

16. Fill in the blank: A query plan describes the _____ involved with executing a query by a relational database.

  • spreadsheets
  • reasoning
  • steps (CORRECT)
  • business strategy

17. A business intelligence team can cause _____ when two or more data analysts attempt to use a single data resource in a conflicting way.

  • annotation
  • verification
  • contention (CORRECT)
  • repetition

18. What does database performance measure? Select all that apply.

  • Improvements made to data tools and processes
  • The ability of the database to be reconfigured
  • Any costs associated with the workload being processed by the database (CORRECT)
  • The workload that can be processed by the database (CORRECT)

19. Which of the following statements accurately describe indexes versus data partitions? Select all that apply.

  • Indexes can only locate one section of a table at a time.
  • Data partitioning is typically used in cloud-based systems handling big data. (CORRECT)
  • Data partitioning is the process of dividing a database into distinct, logical parts. (CORRECT)
  • Indexes are organizational tags used to locate data. (CORRECT)

20. Fill in the blank: Fragmented data occurs when data is broken up into many pieces that are not_____, often as a result of using the data frequently.

  • Structured
  • sorted and filtered
  • stored together (CORRECT)
  • labeled

21. There are four main reasons why data becomes fragmented. The first is using the data files frequently. What are the other three?

  • Clearing data files from the cache
  • Deleting data files (CORRECT)
  • Modifying the data files (CORRECT)
  • Creating new data files (CORRECT)

22. Which of the following statements accurately describe data marts and data lakes? Select all that apply.

  • A data mart is a database system that stores large amounts of raw data in its original format until it’s needed.
  • A data lake is a subject-oriented database that can be a subset of a larger data warehouse.
  • A data mart is a subject-oriented database that can be a subset of a larger data warehouse. (CORRECT)
  • A data lake is a database system that stores large amounts of raw data in its original format until it’s needed. (CORRECT)

23. A business intelligence professional is investigating the steps their database system takes in order to execute a query. They discover that creating a new table will enhance performance. What does this scenario describe?

  • Limiting data
  • Evaluating contentions
  • Checking a query plan (CORRECT)
  • Considering run methodology

24. Fill in the blank: Contention occurs when two or more data analysts attempt to use a _____in a conflicting way.

  • section of a spreadsheet
  • series of reports
  • single data resource (CORRECT)
  • data strategy

25. When evaluating a database system’s resources, what does a business intelligence professional consider? Select all that apply.

  • Users
  • Disk space and memory (CORRECT)
  • Software (CORRECT)
  • Hardware (CORRECT)

26. When evaluating the workload of a database system, what does a business intelligence professional consider? Select all that apply.

  • Context
  • Queries and analyses (CORRECT)
  • System commands (CORRECT)
  • Transactions (CORRECT)

27. What are some key benefits of ELT data pipelines in business intelligence?

  • ELT enables business intelligence professionals to transform data while it is being transported.
  • ELT reduces storage costs and enables businesses to scale storage and computation resources independently. (CORRECT)
  • ELT enables business intelligence professionals to transform only the data they need. (CORRECT)
  • ELT can ingest many different kinds of data into a storage system as soon as that data is available. (CORRECT)

28. Fill in the blank: The goal of _____ is to enable a database system to process the largest possible workload at the most reasonable cost.

  • visibility
  • optimization (CORRECT)
  • business intelligence strategy
  • application development

29. Fill in the blank: A data lake is a database system that stores large amounts of _____ in its original format until it’s needed.

  • live data
  • structured data
  • clean data
  • raw data (CORRECT)

Correct: A data lake is a database system that stores large amounts of raw data in its original format until it’s needed. While the raw data has been tagged to be identifiable, it is not organized.

30. What is the term for a pipeline that extracts, loads, then transforms the data?

  • Warehouse
  • ETL
  • Lineage
  • ELT (CORRECT)

Correct: ELT is a pipeline that extracts, loads, then transforms the data. It enables data to be gathered from data lakes, loaded into a unified destination system, and transformed into a useful format.

31. A database is performing slowly because multiple components are attempting to use the same piece of data at the same time. Which of the factors of database performance should be addressed?

  • Contention (CORRECT)
  • Throughput
  • Workload
  • Resources

Correct: The factor of contention should be addressed. Contention occurs when two or more components attempt to use a single resource in a conflicting way.

32. What is the process of dividing a database into distinct, logical parts in order to improve query processing and increase manageability?

  • Data fragmentation
  • Data processing
  • Data partitioning (CORRECT)
  • Data indexing

Correct: Data partitioning is the process of dividing a database into distinct, logical parts in order to improve query processing and increase manageability. Ensuring data is partitioned appropriately is a key part of database performance optimization.

CONCLUSION – Dynamic Database Design

In conclusion, this advanced course stands as a comprehensive journey into the intricate landscape of database systems, offering a deep dive into data marts, data lakes, data warehouses, and the crucial ETL processes that underpin modern business intelligence frameworks. Participants emerge with a nuanced understanding of how these components interconnect to shape the architecture and functionality of sophisticated data ecosystems.

The course’s spotlight on the five pivotal factors influencing database performance—workload, throughput, resources, optimization, and contention—provides a strategic lens for participants to evaluate and optimize their data infrastructure effectively. Armed with this knowledge, BI professionals can navigate the complexities of database management with confidence, ensuring that their systems not only meet but exceed performance expectations.

As learners culminate their journey, the ability to design efficient queries becomes a powerful tool in their arsenal. The course equips them with the skills to craft queries that not only align with business goals but also maximize the potential of the system’s resources. This newfound proficiency positions participants at the forefront of leveraging advanced database systems to drive organizational success, reinforcing their role as adept architects of data-driven solutions in the dynamic realm of business intelligence.