Course 1 – Foundations of Data Science

Module 1: Introduction to data science concepts

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Introduction to data science concepts

You’ll begin with an introduction to the Google Advanced Data Analytics Certificate. Then, you’ll explore the history of data science and ways that data science helps solve problems today.

Learning Objectives

  • Understand program plans and expectations
  • Explore defining details of a data professional career
  • Describe the key concepts to be shared in the program, including learning outcomes

You’ll begin with an introduction to the Google Advanced Data Analytics Certificate. Then, you’ll explore the history of data science and ways that data science helps solve problems today.

Learning Objectives

  • Understand program plans and expectations
  • Explore defining details of a data professional career
  • Describe the key concepts to be shared in the program, including learning outcomes

PRACTICE QUIZ: ASSESS YOUR READINESS FOR THE ADVANCED ANALYTICS DATA CERTIFICATE

1. What is the key difference between qualitative and quantitative data?

  • Qualitative data is subjective; quantitative data is specific.
  • Qualitative data measures qualities and characteristics; quantitative data measures numerical facts.
  • Qualitative data is about the quality of a product or service; quantitative data is about how much of that product or service is available in the marketplace. (CORRECT)
  • Qualitative data describes the kind of data being analyzed; quantitative data describes how much data is being analyzed.

Correct: Qualitative data measures qualities and characteristics; quantitative data measures numerical facts.

2. Which of the following statements accurately describes wide and long data? Select all that apply.

  • Long data subjects can have data in multiple columns.
  • Wide data subjects can have multiple rows that hold the values of subject attributes.
  • Wide data subjects can have data in multiple columns. (CORRECT)
  • Long data subjects can have multiple rows that hold the values of subject attributes. (CORRECT)

Correct: Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.

3. Structured data is likely to be found in which of the following formats? Select all that apply.

  • Audio file
  • Digital photo
  • Spreadsheet (CORRECT)
  • Database table (CORRECT)

Correct: Structured data is organized in a certain format such as rows and columns. It is likely to be found in a table or spreadsheet. To learn about structured data, enroll in the Google Data Analytics Certificate.

Correct: Structured data is organized in a certain format such as rows and columns. It is likely to be found in a table or spreadsheet. To learn about structured data, review course three of the Google Data Analytics Certificate.

4. Fill in the blank: A Boolean data type can have_____ possible value(s).

  • three
  • infinite
  • one
  • two (CORRECT)

Correct: A Boolean data type can have two possible values.

5. What is the term for the individuals who have invested time and resources in a project and are interested in its outcome?

  • Executives
  • Subject-matter experts
  • Stakeholders (CORRECT)
  • Project sponsors

Correct: Stakeholders are individuals who have invested time and resources in a project and are interested in its outcome.

6. When collecting data for study, what are some reasons to consider sample size? Select all that apply.

  • To eliminate certain segments of a population
  • To include as many participants as possible in the study
  • To make sure a few unusual responses don’t skew results (CORRECT)
  • To collect data that represents a diverse set of perspectives (CORRECT)

Correct: Considering sample size ensures the data represents a diverse set of perspectives and helps avoid skewed results or inaccurate judgments.

7. The SMART methodology can be used to ask a question that promotes change. What type of Smart question leads to change?

  • Action-oriented (CORRECT)
  • Motivational
  • Results-focused
  • Transformational

Correct: A SMART question that promotes change is action-oriented.

8. Which of the following inquiries are leading questions? Select all that apply.

  • How did you learn about our company?
  • What do you enjoy most about our service? (CORRECT)
  • How satisfied were you with our customer representative? (CORRECT)
  • In what ways did our product meet your needs? (CORRECT)

Correct: Leading questions include: How satisfied were you with our customer representative? In what ways did our product meet your needs? And what do you enjoy most about our service? Leading questions direct the respondent to a particular answer, often because they suggest the answer within the question.

9. What are the key characteristics of a metric? Select all that apply.

  • Metrics are unorganized collections of facts.
  • Metrics are quantifiable. (CORRECT)
  • Metrics can be used to evaluate performance. (CORRECT)
  • Metrics are used for measurement. (CORRECT)

Correct: Metrics are quantifiable data types used for measurement and performance evaluation.

10. Which type of bias is the tendency to construe ambiguous situations in a positive or negative way?

  • Confirmation bias
  • Cultural bias
  • Interpretation bias (CORRECT)
  • Observer bias

Correct: Interpretation bias is the tendency to construe ambiguous situations in a positive or negative way.

11. Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What concept does this describes?

  • Privacy
  • Transaction transparency
  • Openness
  • Consent (CORRECT)

Correct: This concept is called consent. Consent is the aspect of data ethics that presumes an individual’s right to know how and why their personal data will be used before agreeing to provide it.

12. Which spreadsheet tool changes how cells appear when values meet a specific condition?

  • Alternating colors
  • Protected ranges
  • Conditional formatting (CORRECT)
  • Data validation

Correct: Conditional formatting is the spreadsheet tool that changes how cells appear when values meet a specific condition.

13. Fill in the blank: In a spreadsheet, the SPLIT function divides a text string around a ___, then puts a each fragment into a new, separate cell.

  • Delimiter (CORRECT)
  • substring
  • indicator
  • mark

Correct: In a spreadsheet, the SPLIT function divides a text string around a delimiter, then puts each fragment into a new, separate cell.

14. Fill in blank: A Programming language is a system of words and symbols used to___ for computers.

  • detect malware
  • repair infrastructure
  • install hardware
  • write instructions (CORRECT)

Correct: A programming language is a system of words and symbols used to write instructions for computers.

15. What are the main benefits of using a programming language to work with data? Select all that apply.

  • Automate decision-making
  • Easily reproduce and share work (CORRECT)
  • Clarify the steps of analysis (CORRECT)
  • Save time (CORRECT)

Correct: There are three main benefits of using a programming language to work with data: Easily reproduce and share work, save time, and clarify the steps of analysis.

16. In order for code to work properly, its necessary to follow the predetermined structure of the coding language. This includes all required words and symbols, as well as their proper placement. What is this structure called?

  • Syntax (CORRECT)
  • Standard
  • Script
  • Symbol

Correct: In order for code to work properly, it’s necessary to follow the syntax of the coding language. This includes all required words and symbols, as well as their proper placement.

17. What is the term for programming code that is freely available and may be modified and shared by the people who use it?

  • Open-source (CORRECT)
  • Common-design
  • Non-dependent
  • One-access

Correct: Open-source code is freely available and may be modified and shared by the people who use it.

18. Data professionals use programming languages to enable which of the following? Select all that apply.

  • Data governance
  • Data transformation (CORRECT)
  • Data cleaning (CORRECT)
  • Data visualization (CORRECT)

Correct: Data professionals use programming languages to enable data transformation, cleaning, and visualization.

19. What type of data visualization should be used to demonstrate how often data values fall onto certain ranges?

  • Bar chart
  • Correlation chart
  • Histogram (CORRECT)
  • Tree map

Correct: To demonstrate how often data values fall into certain ranges, use a histogram.

20. Why is it more effective to label a data visualization instead of using a legend? Select all that apply.

  • Labels help keep people’s attention on relevant data by redirecting their focus away from outliers.
  • Labels can be placed near the data, whereas legends are typically positioned away from the data. (CORRECT)
  • Labels make the data visualization more accessible because they don’t rely on the ability to interpret color. (CORRECT)
  • Labels allow for text explanations to be placed directly on the visualization. (CORRECT)

Correct: It is more effective to label a data visualization instead of using a legend for several reasons: Labels can be placed near the data, they make the data visualization more accessible, and they allow for text explanations to be placed directly on the visualization.

21. Which of the following are appropriate uses for filters in data visualization tools? Select all that apply.

  • Hiding outliers that do not support the hypothesis
  • Limiting the number of rows or columns in view (CORRECT)
  • Highlighting individual data points (CORRECT)
  • Providing data to different users based on their particular needs (CORRECT)

Correct: Filters can be used to highlight individual data points, limit the number of rows or columns in view, and provide data to different users based on their needs.

22. What is data science?

  • The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making
  • A process used to solve complex problems in a user-centric way
  • A field of study that uses raw data to create new ways of modeling and understanding the unknown (CORRECT)
  • A tool for organizing data elements and how they relate to one another.

Correct: Data science is a field of study that uses raw data to create new ways of modeling and understanding the unknown.

23. A dashboard is designed to share insights about the housing market in a city. What type of data visualization would be most effective at demonstrating how the city’s annual home sales have risen over time?

  • Line chart (CORRECT)
  • Scatter plot
  • Pie chart
  • Area chart

Correct: To demonstrate how the city’s annual home sales have risen over time, a line chart would be most effective.

24. What type of visualizations enable the data in a presentation to automatically update and change over time?

  • Customized
  • Static
  • Discrete
  • Dynamic (CORRECT)

Correct: Dynamic visualizations enable the data in a presentation to automatically update and change over time.

25. A data visualization reveals two variables in the data that rises and fall at the same time. When variables are related in this way, what is likely happening?

  • Correlation (CORRECT)
  • Causation
  • Divergence
  • Polarity

Correct: When two variables in a visualization rise and fall at the same time, this is an example of correlation. Correlation is the measure of the degree to which two variables change in relationship to each other.

QUIZ: MODULE 1 CHALLENGE

1. To gain insights about projects and processes, organizations acquire, organize, and interpret data. What type of business professionals help complete these tasks?

  • Data professionals (CORRECT)
  • Clients
  • Information technology professionals
  • Stakeholders

Correct!

2. Fill in the blank: Machine learning differs from automation in that it enables users to express how to perform a task by using _______ instead of explicit instructions.

  • mapping
  • schemas
  • data (CORRECT)
  • sampling

Correct!

3. A company evaluates its data using metrics in order to achieve what goals? Select all that apply.

  • To generate more data
  • To create predictive models (CORRECT)
  • To identify trends (CORRECT)
  • To inform best practices (CORRECT)

Correct!

4. What are some key advantages of the python programming language? Select all that apply.

  • It was created within the data community.
  • It is one of the easiest programming languages to learn. (CORRECT)
  • Its formatting is visually uncluttered. (CORRECT)
  • It can be used to deploy data-driven applications. (CORRECT)

Correct!

5. Fill in the blank: Jupyter Notebook is a web-based computing platform that enables data professionals to _____ in real-time.

  • iterate on a business process
  • run code (CORRECT)
  • query databases
  • visualize data

Correct!

6. A data professional prepares to give a presentation to their colleagues. They want to communicate the story told by the data using charts and graphs made with Tableau. This helps them simplify highly technical information for non-technical stakeholders. Which of the following communication practices does this scenario describe? Select all that apply

  • Creating a statistical model with code
  • Enriching data insights with visual elements (CORRECT)
  • Sharing complex data (CORRECT)
  • Explaining data using a graphical interface (CORRECT)

Correct!

7. Fill in the blank: _______ is a way of distributing computational tasks over a bunch of nearby processors that is good for speed and resilience and does not depend on a single source of computational power.

  • Edge computing (CORRECT)
  • Virtual reality
  • Quantum computing
  • Artificial intelligence

Correct!

8. Which of the following statements accurately describes machine learning? Select all that apply.

  • Professionals use machine learning to express how to perform a task by using explicit instructions.
  • Professionals use machine learning to express how to perform a task by using data. (CORRECT)
  • Machine learning requires iteration to achieve desired outputs. (CORRECT)
  • Machine learning involves training a model. (CORRECT)

Correct!

9. What is the Jupyter Notebook?

  • A web-based computing platform for running code in real time (CORRECT)
  • A file containing a chronologically ordered list of modifications made to a project
  • A computer programming language used to communicate with a database
  • A range of values that conveys how likely it is that a statistical estimate reflects the population

Correct!

10. A data professional uses Tableau to create data visualizations that will help people understand their analysis results. They communicate the data insights using the visualizations, which helps non-technical stakeholders gain important insights. Which of the following communication practices does this scenario describe? Select all that apply

  • Creating a statistical model with code
  • Enriching data stories with visual elements (CORRECT)
  • Simplifying data using a graphical interface (CORRECT)
  • Sharing complex data (CORRECT)

11. Fill in the blank: Edge computing is a way of distributing ____ over a bunch of nearby processors that is good for speed and resilience and does not depend on a single source of computational power.

  • computational tasks (CORRECT)
  • coding libraries
  • models
  • data sources

Correct!

12. To gain insights, businesses rely on _____ to acquire, organize, and interpret the data that informs internal projects and processes.

  • stakeholders
  • data professionals (CORRECT)
  • clients
  • information technology professionals

Correct!

13. What process enables users to express how to perform a task by using data instead of explicit instructions?

  • Statistics
  • Machine learning (CORRECT)
  • Data Science
  • Visualization

Correct!

14. Fill in the blank: Before creating predictive models to identify trends and inform best practices, a company must _____ using metrics.

  • iterate on its processes (CORRECT)
  • encode its data
  • evaluate its data
  • present findings to stakeholders

Correct!

15. A data professional wants to strengthen their communication skills. They study methods for simplifying highly technical information and telling compelling data stories. They also practice using Tableau to design compelling charts and graphs. Which of the following communication practices does this scenario describe? Select all that apply.

  • Creating a statistical model with code
  • Sharing complex data (CORRECT)
  • Enriching data insights with visual elements (CORRECT)
  • Explaining data using a graphical interface (CORRECT)

Correct!

16. What web-based computing platform can be used by data professionals when interacting with Python?

  • SQL
  • HTML
  • R Markdown
  • Jupyter Notebook (CORRECT)

Correct!

17. Fill in the blank: Data professionals use _____ to work efficiently with large datasets.

  • programming languages (CORRECT)
  • schemas
  • data visualizations
  • spreadsheets

Correct: Data analytics professionals use programming languages to work efficiently within large datasets.

18. Before creating predictive models to identify trends and inform best practices, a company must evaluate its data using what type of measurement?

  • SMART methodology (INCORRECT)
  • Metrics
  • Attributes
  • Best practices (INCORRECT)

Seraching for correct answer…

19. What are some key advantages of the Python programming language? Select all that apply.

  • It has an enormous online community and other helpful resources. (CORRECT)
  • It is very flexible. (CORRECT)
  • It is one of the easiest programming languages to write. (CORRECT)
  • It was created within the data community.

Correct!

20. Fill in the blank: Edge computing is a way of distributing computational tasks over a bunch of nearby processors that is good for _______ and resilience and does not depend on a single source of computational power.

  • Speed (CORRECT)
  • augmented reality
  • algorithms
  • artificial intelligence

Correct!

21. What is the term for someone who explores, cleans, analyzes, and visualizes data?

  • Information technology professional
  • Client
  • Data professional (CORRECT)
  • Stakeholder

Correct!