Question 1

Fill in the blank: Missing data has a value that is not stored for a _____ in a dataset.

Accepted Answer

variable (CORRECT)

Question 2

A data professional requests additional information from a dataset’s original owner. Unfortunately, they are not able to provide the information. Therefore, the data professional creates a NaN category in the dataset. What concept does this scenario describe?

Accepted Answer

Solving the problem of missing data (CORRECT)

Question 3

What is the function of the parameters how and on in this code?

Accepted Answer

To tell Python which way to join the data and which column to join from (CORRECT)

Question 4

Non-null count is the total number of blank data entries within a data column.

Accepted Answer

False (CORRECT)

Question 5

What type of outlier is a normal data point under certain conditions, but becomes an anomaly under most other conditions?

Accepted Answer

Contextual outlier (CORRECT)

Question 6

What is the term for a line of text that follows a method or function, which is used to explain the purpose of that method or function to others using the same code?

Accepted Answer

Docstring (CORRECT)

Question 7

A data professional is using a box plot to identify suspected high outliers in a dataset, according to the interquartile rule. To do that, they search for data points greater than the third quartile plus what standard of the interquartile range?

Accepted Answer

1.5 times (CORRECT)

Question 8

Fill in the blank: Label encoding assigns each category a unique _____ instead of a qualitative value.

Accepted Answer

number (CORRECT)

Question 9

When working with dummy variables, data professionals may assign the variables an infinite number of values.

Accepted Answer

False (CORRECT)

Question 10

Which pandas function does a data professional use to convert categorical variables into dummy variables?

Accepted Answer

get_dummies() (CORRECT)

Question 11

Data professionals use input validation to ensure data is complete, error-free, and of high-quality.

Accepted Answer

True (CORRECT)

Question 12

Fill in the blank: If a dataset lacks sufficient information to answer a business question, the process of _____ makes it possible to augment that data by adding values from other datasets.

Accepted Answer

Joining (CORRECT)

Question 13

In which phase of the PACE workflow would a data professional perform the majority of the data-validation process?

Accepted Answer

Analyze (CORRECT)

Question 14

Which of the following terms are used to describe missing data? Select all that apply.

Accepted Answer

Blank, NaN, N/A (CORRECT)

Question 15

Stakeholders at a film studio hire a data analytics firm to provide insights about the best locations for film shoots. However, the film studio’s datasets contain missing data. Which of the following strategies can help the data analytics firm solve this problem? Select all that apply.

Accepted Answer

Create a NaN category. Add in the missing values by taking the average values from the existing data. Ask the film studio to fill in the missing values. (CORRECT)

Question 16

Which section of the code refers to the dataframe to be merged with df?

Accepted Answer

df_zip (CORRECT)

Question 17

What pandas function is used to pull all of the missing values from a data frame?

Accepted Answer

pd.isnull() (CORRECT)

Question 18

What type of outliers are values that are completely different from the overall data group and have no association with any other outliers?

Accepted Answer

Global outliers (CORRECT)

Question 19

A data professional works for a car insurance company. To gain insights about the popularity of electric vehicles, they study categorical data about cars. They add a 0 to their dataset to indicate if a car is gas-powered and a 1 if a car is electric. What does this scenario describe?

Accepted Answer

Using dummy variables (CORRECT)

Question 20

What type of data visualization shows the concentration of values between two data points by illustrating their magnitude with two colors?

Accepted Answer

Heat map (CORRECT)

Question 21

What does the pandas function pd.duplicated() return to indicate that a data value does not have a duplicate value within the same dataset?

Accepted Answer

False (CORRECT)

Question 22

Fill in the blank: The pandas function _____ enables data professionals to create a new dataframe with all duplicate rows removed.

Accepted Answer

drop_duplicates() (CORRECT)

Question 23

Which of the following terms can be used to describe a value that is not stored for a variable in a set of data? Select all that apply.

Accepted Answer

N/A, NaN, Blank (CORRECT)

Question 24

Which of the following is a parameter for the merge?

Accepted Answer

how='left' (CORRECT)

Question 25

What tasks could the pandas function pd.isnull() be used for? Select all that apply.

Accepted Answer

To identify when a value is missing from a data frame, To pull all of the missing values from a data frame (CORRECT)

Question 26

Fill in the blank: Contextual outliers are normal data points under certain conditions but become _____ under most other conditions.

Accepted Answer

Anomalies (CORRECT)

Question 27

A data professional works for a veterinary office. To gain insights about the most common household pets, they study categorical data about pet adoptions over the past five years. They assign the number 1 to dogs, 2 to cats, 3 to hamsters, and so on. What does this scenario describe?

Accepted Answer

Label encoding (CORRECT)

Question 28

Fill in the blank: A _____ is a data visualization that displays the magnitude of a set of values using two colors to show the concentration of the values.

Accepted Answer

heat map (CORRECT)

Question 29

Fill in the blank: A data professional should _____ a duplicate when its value is clearly a mistake or will misrepresent the remaining unique values within the dataset.

Accepted Answer

Eliminate (CORRECT)

Question 30

Fill in the blank: N/A and NaN are terms used to describe _____ data.

Accepted Answer

Missing (CORRECT)

Question 31

What does the pandas function pd.duplicated() return to indicate that a data value is a duplicate of another value within the same dataset?

Accepted Answer

True (CORRECT)

Question 32

A data professional at a garden center researches data related to ideal growing climates. As they familiarize themselves with the datasets, they discover some data is missing. Which of the following strategies can help them solve this problem? Select all that apply.

Accepted Answer

Create a NaN category. Derive new representative values based on available data. Add in the missing values by taking the average values from the existing data. (CORRECT)

Question 33

What pandas function enables a data professional to determine if duplicate values are present in a dataset?

Accepted Answer

pd.deduplication() (CORRECT)

Question 34

A data team for an investment banker works on a project related to interest rates. As they familiarize themselves with the datasets, they discover some data is missing. Which of the following strategies can help them solve this problem? Select all that apply.

Accepted Answer

Ask the owner of the data to fill in the missing values. Derive new representative values based on available data. Add in the missing values by taking the average values from the existing data. (CORRECT)

Question 35

A data team works for a stereo installation company. To gain insights into what products people are most likely to purchase in the coming year, they review categorical data about 20 of the most popular stereos. Rather than using brand names, they assign a different number to each stereo to make the data simpler to join. What does this scenario describe?

Accepted Answer

Label encoding (CORRECT)

Question 36

Which of the following indicates that the first data frame should be merged with another data frame?

Accepted Answer

merge() (CORRECT)

Question 37

What pandas function is used to identify when a value is missing from a data frame?

Accepted Answer

pd.isnull() (CORRECT)

Question 38

Data encoded as N/A, NaN, or a blank is defined as zero.

Accepted Answer

False (CORRECT)

Question 39

What is indicated by the term null?

Accepted Answer

The data is missing. (CORRECT)

Question 40

Fill in the blank: Outliers are observations that are an _____ distance from other values.

Accepted Answer

abnormal (CORRECT)

Question 41

Docstrings are useful within a line of Python code, but they cannot be exported to create library documentation.

Accepted Answer

False (CORRECT)

Question 42

Categorical data can be grouped on its qualities, thus enabling data professionals to store and identify it based on its category.

Accepted Answer

True (CORRECT)

Question 43

Fill in the blank: A heat map uses  _____ to depict the magnitude of an instance or set of values.

Accepted Answer

Colors (CORRECT)

COURSE 3: GO BEYOND THE NUMBERS: TRANSLATE DATA INTO INSIGHTS

Module 3: Clean Your Data

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

TABLE OF CONTENT

INTRODUCTION – Clean Your Data

Learning Objectives

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: THE CHALLENGE OF MISSING OR DUPLICATE DATA

1. Fill in the blank: Missing data has a value that is not stored for a _____ in a dataset.

2. A data professional requests additional information from a dataset’s original owner. Unfortunately, they are not able to provide the information. Therefore, the data professional creates a NaN category in the dataset. What concept does this scenario describe?

3. When merging data, a data professional uses the following code:

What is the function of the parameters how and on in this code?

4. Non-null count is the total number of blank data entries within a data column.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: THE INS AND OUTS OF DATA OUTLIERS

1. What type of outlier is a normal data point under certain conditions, but becomes an anomaly under most other conditions?

2. What is the term for a line of text that follows a method or function, which is used to explain the purpose of that method or function to others using the same code?

3. A data professional is using a box plot to identify suspected high outliers in a dataset, according to the interquartile rule. To do that, they search for data points greater than the third quartile plus what standard of the interquartile range?

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: CHANGING CATEGORICAL DATA TO NUMERICAL DATA

1. Fill in the blank: Label encoding assigns each category a unique _____ instead of a qualitative value.

2. When working with dummy variables, data professionals may assign the variables an infinite number of values.

3. Which pandas function does a data professional use to convert categorical variables into dummy variables?

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: INPUT VALIDATION

1. Data professionals use input validation to ensure data is complete, error-free, and of high-quality.

2. Fill in the blank: If a dataset lacks sufficient information to answer a business question, the process of _____ makes it possible to augment that data by adding values from other datasets.

3. In which phase of the PACE workflow would a data professional perform the majority of the data-validation process?

QUIZ: MODULE 3 CHALLENGE

1. Which of the following terms are used to describe missing data? Select all that apply.

2. Stakeholders at a film studio hire a data analytics firm to provide insights about the best locations for film shoots. However, the film studio’s datasets contain missing data. Which of the following strategies can help the data analytics firm solve this problem? Select all that apply.

3. A data professional writes the following code:

Which section of the code refers to the dataframe to be merged with df?

4. What pandas function is used to pull all of the missing values from a data frame?

5. What type of outliers are values that are completely different from the overall data group and have no association with any other outliers?

6. A data professional works for a car insurance company. To gain insights about the popularity of electric vehicles, they study categorical data about cars. They add a 0 to their dataset to indicate if a car is gas-powered and a 1 if a car is electric. What does this scenario describe?

7. What type of data visualization shows the concentration of values between two data points by illustrating their magnitude with two colors?

8. What does the pandas function pd.duplicated() return to indicate that a data value does not have a duplicate value within the same dataset?

9. Fill in the blank: The pandas function _____ enables data professionals to create a new dataframe with all duplicate rows removed.

10. Which of the following terms can be used to describe a value that is not stored for a variable in a set of data? Select all that apply.

11. A data professional writes the following code:

Which of the following is a parameter for the merge?

12. What tasks could the pandas function pd.isnull() be used for? Select all that apply.

13. Fill in the blank: Contextual outliers are normal data points under certain conditions but become _____ under most other conditions.

14. A data professional works for a veterinary office. To gain insights about the most common household pets, they study categorical data about pet adoptions over the past five years. They assign the number 1 to dogs, 2 to cats, 3 to hamsters, and so on. What does this scenario describe?

15. Fill in the blank: A _____ is a data visualization that displays the magnitude of a set of values using two colors to show the concentration of the values.

16. Fill in the blank: A data professional should _____ a duplicate when its value is clearly a mistake or will misrepresent the remaining unique values within the dataset.

17. Fill in the blank: N/A and NaN are terms used to describe _____ data.

18. What does the pandas function pd.duplicated() return to indicate that a data value is a duplicate of another value within the same dataset?

19. A data professional at a garden center researches data related to ideal growing climates. As they familiarize themselves with the datasets, they discover some data is missing. Which of the following strategies can help them solve this problem? Select all that apply.

20. What pandas function enables a data professional to determine if duplicate values are present in a dataset?

21. A data team for an investment banker works on a project related to interest rates. As they familiarize themselves with the datasets, they discover some data is missing. Which of the following strategies can help them solve this problem? Select all that apply.

23. A data professional writes the following code:

Which of the following indicates that the first data frame should be merged with another data frame?

24. What pandas function is used to identify when a value is missing from a data frame?

25. Data encoded as N/A, NaN, or a blank is defined as zero.

26. What is indicated by the term null?

27. Fill in the blank: Outliers are observations that are an _____ distance from other values.

28. Docstrings are useful within a line of Python code, but they cannot be exported to create library documentation.

29. Categorical data can be grouped on its qualities, thus enabling data professionals to store and identify it based on its category.

30. Fill in the blank: A heat map uses _____ to depict the magnitude of an instance or set of values.

Subscribe to our site

Quiztudy Top Courses

Popular in Coursera

Mood Zone for Studying & Relaxing