Maximum value in the price column of the car_info table

Course 4 – Process Data From Dirty to Clean Quiz Answers

Week 3: Cleaning Data with SQL

GOOGLE DATA ANALYTICS PROFESSIONAL CERTIFICATION

Complete Study Guide

In this section of the Coursera Google Data Analytics course, you’ll learn how to clean data using SQL. You will explore queries and functions that allow you to identify and remove errors, inconsistencies, and redundant information from your datasets. This process is essential for accurate data analysis and can save time in the long run by reducing manual cleaning efforts. With a better understanding of the tools available to you, you’ll be able to create clean datasets that are ready for analysis confidently.

Learning Objectives

  • Describe how SQL can be used to clean large datasets
  • Compare spreadsheet data-cleaning functions to those associated with SQL in databases
  • Develop basic SQL queries for use with databases
  • Apply basic SQL functions for use in cleaning string variables in a database
  • Apply basic SQL functions for transforming data variables

Test your knowledge on sql

1. What is the maximum value in the price column of the car_info table?

  • 45,400 (Correct)
  • 12,978
  • 5,1180
  • 16,430

Correct: For the Maximum value in the price column of the car_info table, to ensure that the values in the price column fell within the expected range, you used the MIN and MAX functions to determine that the maximum price was 45, 400. Knowing this, you were able to clean this column and prepare for analysis. Going forward, you will continue to check columns with numeric data in BigQuery to make sure your data is clean. This will help you quickly identify issues with your data that might cause errors during analysis.

2. Which of the following are benefits of using SQL? Select all that apply.

  • SQL offers powerful tools for cleaning data. (Correct)
  • SQL can handle huge amounts of data. (Correct)
  • SQL can be used to program microprocessors on database servers.
  • SQL can be adapted and used with multiple database programs. (Correct)

Correct: SQL can handle huge amounts of data, can be adapted and used with multiple database programs, and offers powerful tools for cleaning data.

3. Which of the following tasks can data analysts do using both spreadsheets and SQL? Select all that apply.

  • Process huge amounts of data efficiently
  • Use formulas (Correct)
  • Join data (Correct)
  • Perform arithmetic (Correct)

Correct: Analysts can use SQL and spreadsheets to perform arithmetic, use formulas, and join data.

4. SQL is a language used to communicate with databases. Like most languages, SQL has dialects. What are the advantages of learning and using standard SQL? Select all that apply.

  • Standard SQL works with a majority of databases. (Correct)
  • Standard SQL is automatically translated by databases to other dialects.
  • Standard SQL is much easier to learn than other dialects.
  • Standard SQL requires a small number of syntax changes to adapt to other dialects. (Correct)

Correct: Standard SQL works with a majority of databases and requires a small number of syntax changes to adapt to other dialects.

5. In your last query, you processed 415.8 GB of data. How many rows were returned by the query?

  • 198,768
  • 225,038
  • 305,710
  • 214,710 (Correct)

Correct: The last query you ran returns 214,710 rows of data. At the bottom of the data preview, you can see how many rows you returned. Going forward, you can apply this knowledge of data size measurements to better understand how much data you will work with and what tool is best suited to each data analysis project.

Test your knowledge on SQl Queries

1. Which of the following SQL functions can data analysts use to clean string variables? Select all that apply.

  • SUBSTR (Correct)
  • LENGTH
  • COUNTIF
  • TRIM (Correct)

Correct: Data analysts can use the SUBSTR and TRIM functions to clean string variables.

2. You are working with a database table that contains data about playlists for different types of digital media. The table includes columns for playlist_id and name. You want to remove duplicate entries for playlist names and sort the results by playlist ID.

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the name column.

NOTE: The three dots (…) indicate where to add the clause.

Course_4_Weekly_3.1

What playlist name appears in row 6 of your query result?

  • TV Shows
  • Movies
  • Audiobooks
  • Music Videos (Correct)

Correct: The clause DISTINCT name will remove duplicate entries from the name column. The complete query is SELECT DISTINCT name FROM playlist ORDER BY playlist_id. The DISTINCT clause removes duplicate entries from your query result. The playlist name Music Videos appears in row 6 of your query result.

3. You are working with a database table that contains data about music albums. The table includes columns for album_id, title, and artist_id. You want to check for album titles that are less than 4 characters long.

You write the SQL query below. Add a LENGTH function that will return any album titles that are less than 4 characters long.

Course_4_Weekly_3.2

What album ID number appears in row 3 of your query result?

  • 236
  • 131
  • 182 (Correct)
  • 239

Correct: The function LENGTH(title) < 4 will return any album names that are less than 4 characters long. The complete query is SELECT * FROM album WHERE LENGTH(title) < 4. The LENGTH function counts the number of characters a string contains. The album ID number 182 appears in row 3 of your query result.

4. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, and country. You want to retrieve the first 3 letters of each country name. You decide to use the SUBSTR function to retrieve the first 3 letters of each country name, and use the AS command to store the result in a new column called new_country.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 letters of each country name and store the result in a new column as new_country.

NOTE: The three dots (…) indicate where to add the statement.

Course_4_Weekly_3.3

What customer ID number appears in row 2 of your query result?

  • 55 (Correct)
  • 28
  • 47
  • 3

Correct: The statement SUBSTR(country, 1, 3) AS new_country will retrieve the first 3 letters of each state name and store the result in a new column as new_country. The complete query is SELECT customer_id, SUBSTR(country, 1, 3) AS new_country FROM customer ORDER BY country. The SUBSTR function extracts a substring from a string. This function instructs the database to return 3 characters of each country, starting with the first character. The customer ID number 55 appears in row 2 of your query result. 

GOOGLE DATA ANALYTICS COURSERA ANSWERS AND STUDY GUIDE

Liking our content? Then don’t forget to add us to your bookmarks so you can find us easily!

Weekly Breakdown | Google Study Guides | Back to Top

Process Data from Dirty to Clean Weekly Challenge 3

1. Fill in the blank: Data analysts usually use _____ to deal with very large datasets.

  • web browsers
  • spreadsheets
  • SQL (Correct)
  • word processors

Correct: Data analysts usually use SQL to deal with very large datasets.

2. In which of the following situations would a data analyst use spreadsheets instead of SQL? Select all that apply.

  • When using a language to interact with multiple database programs
  • When working with a dataset with more than 1,000,000 rows
  • When working with a small dataset (Correct)
  • When visually inspecting data (Correct)

Correct: An analyst would choose to use spreadsheets instead of SQL when visually inspecting data or working with a small dataset.

3. A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database?

  • INSERT INTO (Correct)
  • UPDATE
  • DROP TABLE IF EXISTS
  • CREATE TABLE IF NOT EXISTS

Correct: The analyst can use the INSERT INTO command to add a new customer to the database.

4. You are working with a database table that contains invoice data. The table includes columns for invoice_id and billing_state. You want to remove duplicate entries for billing state and sort the results by invoice ID.

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the billing_state column.

NOTE: The three dots (…) indicate where to add the clause.

Course_4_Challenge_3.1

What billing state appears in row 17 of your query result?

  • CA
  • NV
  • AZ (Correct)
  • WI

Correct: The clause DISTINCT billing_state will remove duplicate entries from the billing_state column. The complete query is SELECT DISTINCT billing_state FROM invoice ORDER BY invoice_id. The DISTINCT clause removes duplicate entries from your query result. The billing state AZ appears in row 17 of your query result.

5. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. The state names are abbreviated. You want to check for state names that are greater than 2 characters long.

You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 2 characters long.

Course_4_Challenge_3.2

What country appears in row 1 of your query result?

  • Chile
  • Ireland (Correct)
  • India
  • France

Correct: The function LENGTH(state) > 2 will return any state names that are greater than 2 characters long. The complete query is SELECT * FROM customer WHERE LENGTH(state) > 2. The LENGTH function counts the number of characters a string contains. The country Ireland appears in row 1 of your query result.

6. In SQL databases, what data type refers to a number that contains a decimal?

  • Integer
  • Boolean
  • String
  • Float (Correct)

Correct: In SQL databases, the float data type refers to a number that contains a decimal.

7. Fill in the blank: In SQL databases, the _____ function can be used to convert data from one datatype to another.

  • SUBSTR
  • TRIM
  • LENGTH
  • CAST (Correct)

Correct: The CAST function can be used to convert data from one datatype to another.

8. A data analyst is cleaning survey data. The results for an optional question contain many nulls. What function can the analyst use to eliminate the null values from the results?

  • LENGTH
  • CAST
  • COALESCE (Correct)
  • CONCAT

Correct: The analyst can use the COALESCE function to eliminate the null values from the results.

9. You are working with a database table that contains invoice data. The table includes columns about billing location such as billing_city, billing_state, and billing_country. You want to retrieve the first 4 letters of each city name. You decide to use the SUBSTR function to retrieve the first 4 letters of each city name, and use the AS command to store the result in a new column called new_city.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 4 letters of each city name and store the result in a new column as new_city.

NOTE: The three dots (…) indicate where to add the statement.

Course_4_Challenge_3.3

10. What invoice ID number appears in row 7 of your query result?

  • 97
  • 390 (Correct)
  • 206
  • 23

Correct: The statement SUBSTR(billing_city, 1, 4) AS new_city will retrieve the first 4 letters of each city name and store the result in a new column as new_city. The complete query is SELECT invoice_id, SUBSTR(billing_city, 1, 4) AS new_city FROM invoice ORDER BY billing_city. The SUBSTR function extracts a substring from a string. This function instructs the database to return 4 characters of each billing city, starting with the first character. The invoice ID number 390 appears in row 7 of your query result.

11. Data analysts choose SQL for which of the following reasons? Select all that apply. 

  • SQL is a programming language that can also create web apps 
  • SQL is a powerful software program
  • SQL is a well-known standard in the professional community (CORRECT)
  • SQL can handle huge amounts of data (CORRECT)

Correct: Data analysts choose SQL because it is a well-known standard in the professional community. SQL can also handle huge amounts of data. 

Correct: Data analysts choose SQL because it can handle huge amounts of data. SQL is also a well-known standard in the professional community. 

12. In which of the following situations would a data analyst use SQL instead of a spreadsheet? Select all that apply. 

  • When using the COUNTIF function to find a specific piece of information
  • When working with a huge amount of data (CORRECT)
  • When recording queries and changes throughout a project (CORRECT)
  • When quickly pulling information from many different sources in a database (CORRECT)

Correct: A data analyst would use SQL instead of a spreadsheet to work with a huge amount of data. SQL can also quickly pull information from many different sources in a database and record queries and changes throughout a project.

Correct: A data analyst would use SQL instead of a spreadsheet to work with a huge amount of data. SQL can also quickly pull information from many different sources in a database and record queries and changes throughout a project.

Correct: A data analyst would use SQL instead of a spreadsheet to work with a huge amount of data. SQL can also quickly pull information from many different sources in a database and record queries and changes throughout a project.

13. A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables? 

  • INSERT INTO 
  • DROP TABLE IF EXISTS (CORRECT)
  • CREATE TABLE IF NOT EXISTS 
  • UPDATE 

Correct: The analyst can use the DROP TABLE IF EXISTS query to delete the tables so they don’t clutter the database. 

14. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for city names that are greater than 9 characters long. 

You write the SQL query below. Add a LENGTH function that will return any city names that are greater than 9 characters long.

SELECT 

*

FROM

customer

WHERE

What is the first name of the customer that appears in row 7 of your query result?

  • Roberto 
  • Diego 
  • Kara
  • Julia (CORRECT)

15. A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe? 

  • Typecasting (CORRECT)
  • Processing 
  • Calculating 
  • Visualizing 

Correct: The analyst is typecasting. Typecasting means converting data from one type to another. 

16. A data analyst is working with product sales data. They import new data into a database. The database recognizes the data for product price as text strings. What SQL function can the analyst use to convert text strings to floats?

  • SUBSTR
  • TRIM
  • LENGTH
  • CAST (CORRECT)

Correct: The analyst can use the CAST function to convert text strings to floats.

17. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, and country. The state names are abbreviated. You want to retrieve the first 2 letters of each state name. You decide to use the SUBSTR function to retrieve the first 2 letters of each state name, and use the AS command to store the result in a new column called new_state. 

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 2 letters of each state name and store the result in a new column as new_state. 

NOTE: The three dots (…) indicate where to add the statement.

SELECT

customer_id,

FROM

customer

ORDER BY

state DESC

What customer ID number appears in row 9 of your query result?

  • 3
  • 55
  • 47 (CORRECT)
  • 10

Correct: The statement SUBSTR(state, 1, 2) AS new_state will retrieve the first 2 letters of each state name and store the result in a new column as new_state. The complete query is SELECT customer_id, SUBSTR(state, 1, 2) AS new_state FROM customer ORDER BY state DESC. The SUBSTR function extracts a substring from a string. This function instructs the database to return 2 characters of each state name, starting with the first character. The customer ID number 47 appears in row 9 of your query result.

18. What are some of the benefits of using SQL for analysis? Select all that apply.

  • SQL has built-in functionalities.
  • SQL tracks changes across a team. (CORRECT)
  • SQL can pull information from different database sources. (CORRECT)
  • SQL interacts with database programs. (CORRECT)

Correct: Some benefits of SQL include tracking changes across a team, interacting with database programs, and pulling information from different database sources.

19. Fill in the blank: _____ refers to the process of converting data from one type to another. 

  • Formatting
  • Cleaning 
  • Typecasting (CORRECT)
  • Querying

Correct: Typecasting refers to the process of converting data from one type to another. 

20. The CAST function can be used to convert the DATE datatype to the DATETIME datatype. 

  • True (CORRECT)
  • False

Correct: The CAST function can be used to convert the DATE datatype to the DATETIME datatype. CAST can be used to convert any database field from one datatype to another. 

21. What SQL function lets you add strings together to create new text strings that can be used as unique keys? 

  • LENGTH
  • CONCAT (CORRECT)
  • CAST
  • COALESCE

Correct: The CONCAT function lets you add strings together to create new text strings that can be used as unique keys.

22. A data analyst is analyzing medical data for a health insurance company. The dataset contains billions of rows of data. Which of the following tools will handle the data most efficiently? 

  • A spreadsheet
  • A word processor 
  • A presentation 
  • SQL (CORRECT)

Correct: SQL will handle the data most efficiently. SQL can handle huge amounts of data.

23. A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database? 

  • CREATE TABLE IF NOT EXISTS
  • INSERT INTO  (CORRECT)
  • UPDATE 
  • DROP TABLE IF EXISTS

Correct: The analyst can use the INSERT INTO command to add a new customer to the database.

24. A data analyst runs a SQL query to extract some data from a database for further analysis. How can the analyst save the data? Select all that apply. 

  • Run a SQL query to automatically save the data.
  • Use the UPDATE query to save the data.
  • Download the data as a spreadsheet. (CORRECT)
  • Create a new table for the data. (CORRECT)

Correct: The analyst can save the data by downloading the data as a spreadsheet or creating a new table for the data. 

Correct: The analyst can save the data by downloading the data as a spreadsheet or creating a new table for the data.

25. Fill in the blank: The _____ function can be used to return non-null values in a list.

  • COALESCE (CORRECT)
  • CAST 
  • TRIM 
  • CONCAT

Correct: The COALESCE function can be used to return non-null values in a list.

26. You are working with a database table named customer that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for postal codes that are greater than 7 characters long. 

You write the SQL query below. Add a LENGTH function that will return any postal_code that is greater than 7 characters long.

NOTE: The three dots (…) indicate where to add the clause.

1 SELECT

2 *

3 FROM

4 customer

5 WHERE …

What is the last name of the customer that is in row 10 of your query result?

NOTE: The query index starts at 1 not 0.

  • Ramos
  • Brooks
  • Hughes (CORRECT)
  • Rocha

Correct: The function LENGTH(postal_code) > 7 will return any postal codes that are greater than 7 characters long. The complete query is SELECT * FROM customer WHERE LENGTH(postal_code) > 7. The LENGTH function counts the number of characters a string contains. Hughes is the last name of the customer that appears in row 10 of your query result.  

27. After a company merger, a data analyst receives a dataset with billions of rows of data. They need to leverage this data to identify insights for upper management. What tool would be most efficient for the analyst to use?

  • SQL (CORRECT)
  • Word processor
  • CSV
  • Spreadsheet

28. As a data analyst, you are working on a quick project containing a small amount of data. As the data was emailed to you, there is no need to query the data. What tool should you use to perform your analysis?

  • Spreadsheet (CORRECT)
  • CSV
  • word process
  • SQL

29. You are working with a database table named invoice that contains invoice data. The table includes columns for invoice_id and customer_id. You want to remove duplicate entries for customer_id and sort the results by invoice_id. 

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column. 

NOTE: The three dots (…) indicate where to add the clause.

1 SELECT …

2 FROM

3 invoice

4 ORDER BY

5 invoice_id

What customer ID number appears in row 12 of your query result?

NOTE: The query index starts at 1 not 0.

  • 42
  • 8
  • 16 (CORRECT)
  • 23

30. You’re working with a dataset that contains a float column with a significant amount of decimal places. This level of granularity is not needed for your current analysis. How can you convert the data in the float column to be integer data?

  • LENGTH
  • TRIM
  • SUBSTR
  • CAST (CORRECT)

31. You are working with a database table that contains invoice data. The table includes columns about billing location such as billing_city, billing_state, and billing_country. You use the SUBSTR function to retrieve the first 4 letters of each billing city name, and use the AS command to store the result in a new column called new_city.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 4 letters of each billing city name and store the result in a new column as new_city. 

NOTE: The three dots (…) indicate where to add the statement.

NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index

1 SELECT

2 invoice_id,

3 …

4 FROM

5 invoice

6 ORDER BY

7 billing_city

What invoice ID number is in row 7 of your query result? 

NOTE: The query index starts at 1 not 0.

  • 390
  • 97
  • 23 (CORRECT)
  • 206

32. A junior data analyst joins a new company. The analyst learns that SQL is heavily utilized within the organization. Why would the organization choose to invest in SQL? Select all that apply.

  • SQL is a powerful software program.
  • SQL is a programming language that can also create web apps.
  • SQL can handle huge amounts of data. (CORRECT)
  • SQL is a well-known standard in the professional community. (CORRECT)

33. Your manager tasks you with analyzing a dataset and visually inspecting the data. Upon initial inspection you realize that this is a small dataset. What tool should you use to analyze the data?

  • Word processor
  • CSV
  • SQL
  • Spreadsheet (CORRECT)

34. A data analyst creates a database to store information on the company’s customer data. When completing the initial import the analyst notices that they forgot to add a few customers into the table. What command can the analyst use to add these missed customers?

  • INSERT INTO (CORRECT)
  • DROP
  • APPEND
  • ADD

35. You are working with a database table named invoice that contains invoice data. The table includes a column for customer_id. You want to remove duplicate entries for customer_id and get a count of total customers in the database. 

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column. 

NOTE: The three dots (…) indicate where to add the clause.

1

2

3

SELECT COUNT(…)

FROM

invoice

Run

Reset

What is the total number of customers in the database?

  • 59 (CORRECT)
  • 43
  • 84
  • 105

36. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You use the SUBSTR function to retrieve the first 3 characters of each last_name, and use the AS command to store the result in a new column called new_last_name.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each last_name and store the result in a new column as new_last_name. 

NOTE: The three dots (…) indicate where to add the statement.

NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index

1 SELECT

2 employee_id,

3 …

4 FROM

5 employee

6 ORDER BY

7 postal_code

What employee ID number is in row 8 of your query result?

NOTE: The query index starts at 1 not 0.

  • 3
  • 8
  • 7
  • 1 (CORRECT)

37. A data analyst is tasked with identifying what orders are still in transit. The current list of orders contains trillions of rows. What is the best tool for the analyst to use?

  • SQL (CORRECT)
  • Word processor
  • CSV
  • Spreadsheets

38. You’ve been working on a large project for your organization that has spanned many months. Throughout the project you have created multiple tables to save your progress and store data you may need later on. Because the project is ending soon, you decide to do some housekeeping and clean up the tables you will no longer need. What command will you use to accomplish this task?

  • DROP COLUMN IF EXISTS
  • DROP ROW IF EXISTS
  • DROP TABLE IF EXISTS (CORRECT)
  • DROP IF EXISTS TABLE

39. You are working with a database table named invoice that contains invoice data. The table includes a column for invoice_date. You want to remove duplicate entries for invoice_date. 

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the invoice_date column. 

NOTE: The three dots (…) indicate where to add the clause.

1 SELECT …

2 FROM

3 invoice

What invoice_date is in row 17 of your query result?

NOTE: The query index starts at 1 not 0.

  • 2009-04-06
  • 2009-03-14 (CORRECT)
  • 2009-01-03
  • 2009-03-05

Correct: The clause DISTINCT invoice_date will remove duplicate entries from the billing_state column. The complete query is SELECT DISTINCT invoice_date FROM invoice.

40. You’re working with a dataset that contains a float column with a significant amount of decimal places. This level of granularity is not needed for your current analysis. How can you convert the data in the float column to be integer data?

  • TRIM
  • LENGTH
  • CAST (CORRECT)
  • SUBSTR

41.  Fill in the blank: The _____ function can be used to change the data type of a column.

  • TRIM
  • COALESCE
  • CONCAT
  • CAST (CORRECT)

42. Fill in the blank: The _____ function can be used to join strings to create a new column.

  • COALESCE
  • TRIM
  • CAST
  • CONCAT (CORRECT)

Cleaning Data with SQL conclusion

Learning how to clean data using SQL can make an analyst’s job much easier. In this part of the course, you’ll explore queries and functions that you can use in SQL to clean your data. This is a valuable skill for anyone looking to work with data. If you want to learn more about how to clean data using SQL, join the learning experience in Coursera.