COURSE 2: THE PATH TO INSIGHTS: DATA MODELS AND PIPELINES

Module 1: Data Models and Pipelines

GOOGLE BUSINESS INTELLIGENCE PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Data Models and Pipelines

Embarking on this course is an immersive journey into the core aspects of data management. It begins with a deep dive into data modeling, where the intricacies of common schemas and essential database elements are unraveled. The course emphasizes the symbiotic relationship between business needs and the database systems implemented by BI professionals, providing insights into the strategic decisions that shape these crucial infrastructures.

As the course progresses, the spotlight shifts to pipelines and ETL processes, indispensable tools that underpin the movement and accessibility of data. Pipelines serve as the conduits connecting diverse data sources, while ETL processes meticulously transform raw data into refined, usable formats. The significance of these processes becomes evident as they play a crucial role in maintaining data integrity, consistency, and relevance throughout its journey within an organization.

Learning Objectives

  • Identify and define key database models and schemas.
  • Assess which database design pattern and schema is appropriate for different data.
  • Discuss data model alternatives that would be optimal, performant, and adherent to the reporting requirements looking into current data size and growth.
  • Define ETL and explain what it means.
  • Identify key information from stakeholders necessary to create a data pipeline.
  • Describe different types of pipelines.
  • Describe the key stages of a data pipeline.
  • Understand what a data pipeline is, its objectives, and how it works.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DATA MODELING, SCHEMAS, AND DATABASES

1. A business intelligence professional stores large amounts of raw data in its original format within a database system. Then, they can access the data whenever they need it for their BI project. What type of database are they using?

  • Data structure
  • Data lake (CORRECT)
  • Data ecosystem
  • Data storage

Correct: They are using a data lake. A data lake is a database system that stores large amounts of raw data in its original format until it’s needed.

2. Which of the following statements correctly describes primary and foreign keys? Select all that apply.

  • A foreign key references a column or a group of columns in which each row uniquely identifies each record in the table.
  • A primary key forms connections because it is a field within a database table that’s a primary key in another table.
  • A primary key references a column or a group of columns in which each row uniquely identifies each record in the table. (CORRECT)
  • A foreign key forms connections because it is a field within a database table that is a primary key in another table. (CORRECT)

Correct: A primary key references a column or a group of columns in which each row uniquely identifies each record in the table. A foreign key forms connections because it is a field within a database table that is a primary key in another table.

3. Fill in the blank: A _____ schema is an extension of a star schema, which contains additional dimensions.

  • dimensional
  • snowflake (CORRECT)
  • geometric
  • responsive

Correct: A snowflake schema is an extension of a star schema, which contains additional dimensions.

4. What type of database stores relevant data locally for analysis and less relevant data remotely?

  • Combined systems
  • Separated storage and computing systems (CORRECT)
  • Single-homed databases
  • Distributed databases

Correct: Separated storage and computing systems are databases that store relevant data locally for analysis and less relevant data remotely.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: CHOOSE THE RIGHT DATABASE

1. When designing a data warehouse, BI professionals take into account which of the following considerations? Select all that apply.

  • The visualization tools that will be used for the dashboard
  • The business needs (CORRECT)
  • The shape and volume of the data (CORRECT)
  • The model that the data warehouse will follow (CORRECT)

Correct: When designing a data warehouse, BI professionals take into account the business needs, the shape and volume of data, and what model the data warehouse will follow.

2. Fill in the blank: Logical data modeling involves representing different _____ in a physical data model.

  • Schemas
  • Tables (CORRECT)
  • Permissions
  • Users

Correct: Logical data modeling involves representing different tables in a physical data model.

3. A BI professional considers the relevant data for a project, the names and data types of table columns, formatting of data entries, and unique keys for database entries and objects. What will these activities enable them to accomplish? 

  • Select appropriate elements for their database schema (CORRECT)
  • Understand the logic behind their data warehouse design
  • Verify their business rules
  • Differentiate between several types of databases

Correct: These activities will enable the BI professional to select appropriate elements for their database schema.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: HOW DATA MOVES

1. What is the term for the predetermined locations where pipeline data is sent in order to be acted on?

  • Application interface
  • Installations
  • Target tables (CORRECT)
  • Data models

Correct: Target tables are the predetermined locations where pipeline data is sent in order to be acted on.

2. A BI professional uses a pipeline to access source systems, then reads and collects the necessary data from within them. Which ETL stage does this scenario describe?

  • Transform
  • Extract (CORRECT)
  • Load
  • All three stages involve this step

Correct: This describes the extract stage. During extraction, the pipeline accesses source systems, then reads and collects the necessary data from within them.

3. Many BI tools are built upon similar principles and often have similar utilities. Therefore, a BI professional’s general understanding of one tool can be applied to others. What is this an example of?

  • System scalability
  • A key performance indicator
  • A transferable skill (CORRECT)
  • Using a preferred vendor

Correct: Applying knowledge of one tool to another is an example of a transferable skill.

QUIZ: MODULE 1 CHALLENEGE

1. Which of the following statements accurately describes Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) tools?

  • Both are optimized for processing. But Online Analytical Processing (OLAP) tools are also optimized for analysis and can analyze data from multiple databases. (CORRECT)
  • Both can analyze data from multiple databases. But Online Transaction Processing (OLTP) tools are optimized for analysis.
  • Both are optimized for analysis and processing. But Online Transaction Processing (OLTP) source systems can analyze data from multiple databases.
  • Both are optimized for processing. But Online Transaction Processing (OLTP) tools are also optimized for analysis and can analyze data from multiple databases.

2. What does a design pattern enable business intelligence professionals to accomplish? Select all that apply.

  • Aggregate data
  • Set user permissions
  • Use relevant measures and facts (CORRECT)
  • Create a model that supports business needs (CORRECT)

3. Fill in the blank: A primary key references a _____ to uniquely identify records in the table.

  • row or group of rows
  • column or group of columns (CORRECT)
  • query
  • schema

4. A dimensional model is a type of relational model that has been optimized in order to achieve what goal?

  • Clean and filter data
  • Quickly retrieve data from a warehouse (CORRECT)
  • Avoid redundant data
  • Anonymize stored data

5. In a dimensional model, what is used to connect a dimension table to the appropriate fact table?

  • Primary key
  • Data map
  • Index
  • Foreign key (CORRECT)

6. How many fact tables exist in a star schema?

  • One (CORRECT)
  • Five
  • An infinite number
  • Zero

7. Members of a business intelligence team are transitioning their current database schemas to a new, desired state. They add columns, remove elements, and make other improvements. What does this situation describe?

  • Database iteration
  • Data partitioning
  • Data throughput
  • Database migration (CORRECT)

8. What are some key benefits of columnar databases? Select all that apply.

  • Sorting data quickly by row
  • Identifying appropriate data to solve a business problem
  • Processing analytical queries effectively (CORRECT)
  • Retrieving information from specific columns (CORRECT)

9. A business intelligence team is working with a database system that is a collection of data systems spread across multiple physical locations. What type of database system are they using?

  • Distributed database system (CORRECT)
  • Combined database system
  • Separated storage and computing system
  • Single-homed database system

10. Fill in the blank: A database schema must describe ___ because this is necessary when users want to understand how the data is shaped and the relationships within the database.

  • data partitions
  • relevant data (CORRECT)
  • system commands
  • business rules

11. Which of the following statements accurately describe the loading stage of the ETL process? Select all that apply.

  • Data is returned to its source.
  • Once the data has been delivered, it can exist within multiple locations. (CORRECT)
  • Once the data has been delivered, it can exist in multiple formats. (CORRECT)
  • Data is delivered to its target destination. (CORRECT)

12. Which of the following statements correctly describe Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) tools? Select all that apply.

  • Online Analytical Processing (OLAP) tools are optimized for data processing instead of analysis.
  • Online Transaction Processing (OLTP) tools are optimized for analysis in addition to processing and can analyze data from multiple databases.
  • Online Transaction Processing (OLTP) tools are optimized for data processing instead of analysis. (CORRECT)
  • Online Analytical Processing (OLAP) tools are optimized for analysis in addition to processing and can analyze data from multiple databases. (CORRECT)

13. Which of the following statements accurately describe primary keys? Select all that apply.

  • Several primary keys can exist in a table.
  • Only one primary key can exist in a table. (CORRECT)
  • Primary keys can appear as foreign keys in other tables. (CORRECT)
  • A primary key uniquely identifies each record in a table. (CORRECT)

14. In a dimensional model, what might facts represent? Select all that apply.

  • Supporting details
  • Attributes
  • Metrics (CORRECT)
  • Events (CORRECT)

15. In a snowflake schema, what is used to break down the schema into even more specific tables? Select all that apply.

  • Query plans
  • Fact tables
  • Dimensions (CORRECT)
  • Subdimensions (CORRECT)

16. A business intelligence team wants to improve the state of their database schemas. While working toward this goal, they move data from one source platform to another target database. What process does this situation describe?

  • Database iteration
  • Data partitioning
  • Database migration (CORRECT)
  • Data throughput

17. Which of the following statements accurately describe row-based and columnar databases? Select all that apply.

  • In columnar databases, it’s necessary to pull data using all of the rows within the table.
  • In row-based databases, each row in the table is an instance or an entry in the database. (CORRECT)
  • Row-based databases are organized by rows; columnar databases are organized by columns. (CORRECT)
  • Columnar databases are often used in data warehouses. (CORRECT)

18. A business intelligence team is working with a database system that stores and analyzes data in the same place. What type of database system are they using?

  • Single-homed database system
  • Separated storage and computing system
  • Combined database system (CORRECT)
  • Distributed database system

19.There are four key elements within database schemas. The first is relevant data. What are the other three? Select all that apply.

  • Elimination of all database permissions
  • Consistent formatting across data entries (CORRECT)
  • Unique keys for every database entry and object (CORRECT)
  • Names and data types for each column in each table (CORRECT)

20. In which stage of the ETL process is data delivered to a target system?

  • All three stages
  • Extraction
  • Transformation
  • Loading (CORRECT)

21. Object-oriented programming languages _____ data objects with code, enabling them to be used again later.

  • encode and translate
  • interpret and analyze
  • label and define (CORRECT)
  • compile and group

22. Fill in the blank: In order to create an effective data model, business intelligence professionals will often apply a _____, which uses relevant measures and facts to create a model that supports business needs.

  • query plan
  • business rule
  • target table
  • design pattern (CORRECT)

23. In row-based databases, each row in a table is an instance or an entry in the database. How are details about that instance recorded and organized?

  • By table
  • By row
  • By column (CORRECT)
  • By field

24. In the ETL loading stage, what are typical target destinations to which the data might be delivered? Select all that apply.

  • Website application
  • Data lake (CORRECT)
  • Analytics platform (CORRECT)
  • Data warehouse (CORRECT)

25. What type of programming language uses another program to read and execute coded instructions?

  • Object-oriented
  • Interpreted (CORRECT)
  • Functional
  • Compiled

26. Business intelligence professionals use design patterns to apply relevant measures and facts to what process?

  • Creating a data model (CORRECT)
  • Aggregating data
  • Selecting computing systems
  • Setting user permissions

27. Fill in the blank: A foreign key is a field within a database table that is a _____ in another table.

  • query
  • model
  • primary key (CORRECT)
  • schema

28. A business intelligence team is performing a database migration to improve the state of their database schemas. Which of the following activities may occur during their database migration? Select all that apply.

  • Maintaining data in its original source platform
  • Splitting existing data fields (CORRECT)
  • Adding new data tables (CORRECT)
  • Changing data types (CORRECT)

29. Which type of tool can analyze data from multiple databases and is optimized for both analysis and processing?

  • Data lake
  • Online Transaction Processing (OLTP)
  • Data mart
  • Online Analytical Processing (OLAP) (CORRECT)

30. Fill in the blank: In a dimensional model, a foreign key is used to connect a _____ table to the appropriate fact table.

  • relationship
  • dimension (CORRECT)
  • measurement
  • primary

31. What are the key benefits of star and snowflake schemas? Select all that apply.

  • Efficient data analysis
  • Effective data monitoring (CORRECT)
  • High-scale information delivery (CORRECT)
  • More efficient output (CORRECT)

32. Fill in the blank: A database schema must use _____ for each entry within the database in order to build connections between tables and enable users to combine relevant data.

  • data partitions (CORRECT)
  • unique keys
  • system commands
  • business rules

33. In a dimensional model, what might dimensions represent? Select all that apply.

  • Events
  • Metrics
  • Supporting details (CORRECT)
  • Attributes (CORRECT)

34. Which of the following statements accurately describe the loading stage of the ETL process? Select all that apply.

  • Data is returned to its source.
  • Data is delivered to its target destination. (CORRECT)
  • Once the data has been delivered, it can exist within multiple locations. (CORRECT)
  • Once the data has been delivered, it can exist in multiple formats. (CORRECT)

35. Fill in the blank: Python is a programming language that is _____, which means it’s modeled around chunks of code that capture certain information.

  • object-oriented (CORRECT)
  • formulaic
  • code-based
  • information-centered

36. Fill in the blank: In order to create an effective model, a design pattern uses _____ that are important to the business. Select all that apply.

  • resources
  • protocols
  • facts (CORRECT)
  • measures (CORRECT)

Correct: In order to create an effective model, a design pattern uses measures and facts that are important to the business.

37. What type of schema consists of one fact table that references any number of dimension tables?

  • Snowflake
  • Star (CORRECT)
  • Dimensional
  • Relational

Correct: A star schema consists of one fact table that references any number of dimension tables.

38. Which database framework features a collection of data systems that exist across many physical locations?

  • OLAP
  • Row-based
  • Columnar
  • Distributed (CORRECT)

Correct: A distributed database framework features a collection of data systems that exist across many physical locations.

39. Fill in the blank: The shape of data refers to the rows and columns of tables within a data warehouse, as well as the _____ of data it contains.

  • availability
  • volume (CORRECT)
  • type
  • visibility

Correct: The shape of data refers to the rows and columns of tables within a data warehouse, as well as the volume of data it contains.

40. What are some of the key processes performed with data pipelines? Select all that apply.

  • Construct data storage systems
  • Automate the extraction, transformation, combination, validation, and loading of data (CORRECT)
  • Define what, where, and how data is combined (CORRECT)
  • Help eliminate errors and latency (CORRECT)

Correct: Data pipelines are used to define what, where, and how data is combined. They automate the processes involved in extracting, transforming, combining, validating, and loading of data. They also help eliminate errors and latency.

41. In which ETL stage would a business intelligence professional map data types from the sources to the target system in order to ensure the data fits the destination?

  • Load
  • Extract
  • Transform (CORRECT)
  • All three stages involve this step

Correct: In the transform stage, a business intelligence professional maps data types from the sources to the target system in order to ensure the data fits the destination.

CONCLUSION – Data Models and Pipelines

In conclusion, this course serves as a pivotal guide into the multifaceted realm of data management, where the synergy between business imperatives and technical implementations comes to the forefront. By unraveling the intricacies of data modeling, common schemas, and database elements, learners gain a comprehensive understanding of how these elements coalesce to meet the unique needs of organizations. The course’s emphasis on the strategic decision-making involved in selecting and implementing database systems underscores the integral role that BI professionals play in shaping data architectures that drive business success.

As the journey progresses to explore pipelines and ETL processes, learners are introduced to the unsung heroes of data movement and transformation. These tools, acting as conduits and choreographers, play a pivotal role in ensuring data’s fluidity and utility. Understanding the significance of pipelines in linking disparate sources and ETL processes in refining raw data enhances one’s grasp of the intricate dance that takes data from its raw state to a refined, actionable form.

Ultimately, this course empowers learners to navigate the evolving landscape of data management, where effective decision-making hinges on the seamless integration of business needs with cutting-edge technologies. By the course’s conclusion, participants are poised to harness the power of data modeling, database systems, pipelines, and ETL processes, unlocking their potential to shape a data-driven future for organizations and making informed strides toward success in the dynamic field of Business Intelligence.