Sparkling-Clean Data | Difference between a null and a zero in a dataset

Course 4 – Process Data from Dirty to Clean Quiz Answers

Week 2: Sparkling-Clean Data

GOOGLE DATA ANALYTICS PROFESSIONAL CERTIFICATION

Complete Study Guide

Sparkling-Clean Data introduction

Data is the lifeblood of analysis, and sparkling-clean data is essential for any successful analysis. On Coursera’s Google Data Analytics Certificate program, you will learn how to identify and clean dirty data so that it can be used for meaningful insights and decision-making. You’ll also explore various techniques for cleaning data in spreadsheets and other tools, as well as methods for detecting problems with falsely entered or missing values.

By understanding how to recognize and process dirty data into sparkling-clean information, you will gain valuable skills to help you become a more effective analyst. So if you want to take your analytics career up a notch, enroll now on Coursera’s Google Data Analytics Certificate program! With this knowledge under your

Learning Objectives

  • Differentiate between clean and dirty data
  • Explain the characteristics of dirty data
  • Describe data cleaning techniques with reference to identifying errors, redundancy, compatibility and continuous monitoring
  • Identify common pitfalls when cleaning data
  • Demonstrate an understanding of the use of spreadsheets to clean data

Test your knowledge on Clean versus dirty data

1. Describe the difference between a null and a zero in a dataset.

  • A null indicates that a value does not exist. A zero is a numerical response. (Correct)       
  • A null signifies invalid data. A zero is missing data.
  • A null represents a value of zero. A zero represents an empty cell.
  • A null represents a number with no significance. A zero represents the number zero.

Correct: Data integrity is the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.

2. What are the most common processes and procedures handled by data engineers? Select all that apply.

  • Developing, maintaining, and testing databases and related systems. (Correct)
  • Transforming data into a useful format for analysis. (Correct)
  • Verifying results of data analysis
  • Giving data a reliable infrastructure. (Correct)

Correct: Data engineers transform data into a useful format for analysis; give it a reliable infrastructure; and develop, maintain, and test databases and related systems.

3. What are the most common processes and procedures handled by data warehousing specialists? Select all that apply.

  • Ensuring data is backed up to prevent loss. (Correct)
  • Ensuring data is available. (Correct)
  • Ensuring data is secure. (Correct)
  • Ensuring data is properly cleaned

Correct: Data warehousing specialists are responsible for ensuring data is available, secure, and backed up to prevent loss.

4. A data analyst is cleaning a dataset. They want to confirm that users entered five-digit zip codes correctly by checking the data in a certain spreadsheet column. What would be most helpful as the next step?

  • Using the MAX function to determine the maximum value in the cells in the column
  • Using the field length tool to specify the number of characters in each cell in the column. (Correct)
  • Formatting the cells in the column as number
  • Changing the column width to fit only five digits

Correct: Using the field length tool to specify the number of characters in each cell in the column would be the most helpful.

5. Review the final product of the spreadsheet you cleaned during this activity. Which of the following is the rightmost column?

  • Column AA
  • Column Z
  • Column AZ
  • Column AS (Correct)

Correct: In the final product of this activity, the rightmost column is Column AS. You are able to find this information after you properly transpose the data. Going forward, you can apply what you learned about data cleaning and transposing to work with data in the future.

Test your knowledge on data-cleaning techniques

1. Fill in the blank: Every database has its own formatting, which can cause the data to seem inconsistent. Data analysts use the _____ tool to create a clean and consistent visual appearance for their spreadsheets.

  • clear formats (Correct)
  • autocorrect
  • conditional formatting
  • spellcheck

Correct: Data analysts use the clear formats tool to create a clean and consistent visual appearance for their spreadsheets.

2. What is the process of combining two or more datasets into a single dataset?

  • Data transferring
  • Data merging (Correct)
  • Data composition
  • Data validation

Correct: Data merging is the process of combining two or more datasets into a single dataset.

3. Fill in the blank: In data analytics, _____ describes how well two or more datasets are able to work together.

  • suitability
  • alignment
  • compatibility (Correct)
  • agreement

Correct: Compatibility describes how well two or more datasets are able to work together.

4. Which of the following functions divides text around a specified character or string and puts each fragment of text into a separate cell in the row?

  • The TRIM function
  • The COUNTIF function
  • The CONCATENATE function
  • The SPLIT function (Correct)

Correct: The SPLIT function divides text around a specified character or string, and puts each fragment of text into a separate cell in the row. Spreadsheet functions are useful tools for data cleaning, and knowing how to use functions effectively is a key part of every data analyst’s skill set.

Test your knowledge on Cleaning data in spreadsheets

1. Describe the relationship between a text string and a substring.

  • A text string is a group of characters within a cell. A substring is a smaller subset of that text string. (Correct)
  • A text string is the list of attributes at the top of columns within a table. A substring is a single attribute within that list.
  • A text string is a column of data within a table. A substring is one cell within that column.
  • A text string is a row of data within a table. A substring is one cell within that row.

Correct: A text string is a group of characters within a cell. A substring is a smaller subset of that text string.

2. A data analyst uses the COUNTIF function to count the number of times a value less than 5 occurs between spreadsheet cells A2 through A100. What is the correct syntax?

  • =COUNTIF(A2:A100,”<5″) (Correct)
  • =COUNTIF(A2:A100,”>5″)
  • =COUNTIF(A2:A100,>5)
  • =COUNTIF(A2:A100,<5)

Correct: The correct syntax is =COUNTIF(A2:A100,”<5″). COUNTIF will return the number of cells that match a value. A2:A100 is the range. And “<5” is the specified value.

3. Fill in the blank: To remove leading, trailing, and repeated spaces in data, analysts use the ____ function.

  • RIGHT
  • TRIM (Correct)
  • LEFT
  • MID

Correct: TRIM is a function that removes leading, trailing, and repeated spaces in data.

Process Data from Dirty to Clean Weekly Challenge 2

1. Conditional formatting is a spreadsheet tool that changes how cells appear when values meet a specific condition. Data analysts can use conditional formatting to do which of the following tasks? Select all that apply.

  • To make cells stand out for more efficient analysis. (Correct)
  • To sort data in series of cells into a meaningful order
  • To identify blank cells or missing information (Correct)
  • To calculate mathematical equations

Correct: Data analysts use conditional formatting to identify blank cells or missing information and to make cells stand out for more efficient analysis.

2. A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?

  • Delimiter (Correct)
  • Partition
  • Unit
  • Substring

Correct: When using the SPLIT function, the specified character separating each item is called a delimiter.

3. For a function to work properly, data analysts must follow each function’s predetermined structure. What is this structure called?

  • Summary
  • Algorithm
  • Validation
  • Syntax (Correct)

Correct: This structure is called syntax. Syntax is a predetermined structure that includes all required information and its proper placement.

4. You are working with the following selection of a spreadsheet:

Course_4_Weekly_Challenge_2.1

In order to extract the five-digit postal code from Brandon, FL, what is the correct function?

  • =RIGHT(5,B4)
  • =LEFT(5,B4)
  • =LEFT(B4,5)
  • =RIGHT(B4,5) (Correct)

Correct: The correct syntax is =RIGHT(B4,5). The RIGHT function returns a set number of characters from the right side of a text string. B4 is the specified cell. And 5 is the number of characters to return.

5. A data analyst in a human resources department is working with the following selection of a spreadsheet:

Course_4_Weekly_Challenge_2.2

They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20201939 for the employee in row 4?

  • =CONCATENATE(A4*B4)
  • =CONCATENATE(A4!B4)
  • =CONCATENATE(A4,B4) (Correct)
  • =CONCATENATE (A4+B4)

Correct: To create the ID 20201939 for the employee in row 4, the function is =CONCATENATE(A4,B4). CONCATENATE joins together two or more text strings. (A4,B4) are the locations of the strings to be joined.

6. A data analyst at an e-commerce company is working with a spreadsheet containing last month’s sales. The most expensive product their company sells costs $49.99, so they want to quickly confirm that all of the data in the Sales column is $49.99 or less. What function can they use?

  • SUMIF
  • COUNT
  • SUM
  • COUNTIF (Correct)

Correct: They can use COUNTIF, which is a function that returns the number of cells that match a specified value or parameter.

7. The V in VLOOKUP stands for what?

  • Visual
  • Vertical (Correct)
  • Variable
  • Virtual

Correct: The V in VLOOKUP stands for vertical. VLOOKUP is a spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information.

8. Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.

  • True (Correct)
  • False

Correct: To evaluate how well two or more data sources work together, data analysts use data mapping.

9. An analyst is cleaning a new dataset containing 500 rows. They want to make sure the data contained from cell B2 through cell B300 does not contain a number greater than 50. Which of the following COUNTIF function syntaxes could be used to answer this question? Select all that apply.

  • =COUNTIF(B2:B300,>50)
  • =COUNTIF(B2:B300,<=50)
  • =COUNTIF(B2:B300,”>50″) (CORRECT)
  • =COUNTIF(B2:B300,”<=50”) (CORRECT)

Correct: One possible syntax is =COUNTIF(B2:B300,”>50″). This returns the number of cells that are greater than 50. Another option is =COUNTIF(B2:B300,<=50). This returns the number of cells that are less than or equal to 50. Either one can confirm that the data does not contain a number greater than 50.

Correct: One possible syntax is =COUNTIF(B2:B300,”>50″). This returns the number of cells that are greater than 50. Another option is =COUNTIF(B2:B300,”<=50″). This returns the number of cells that are less than or equal to 50. Either one can confirm that the data does not contain a number greater than 50.

10. A delimiter is a character that indicates the beginning or end of a data item. The split text to columns tool uses a delimiter to accomplish what task?

  • To change the format of a column of text 
  • To split one column into two
  • To specify where to split a text string (CORRECT)
  • To split duplicate substrings

Correct: The split text to columns tool uses a delimiter to specify where to split a text string.

11. Fill in the blank: When describing a SUM function, the _____ is =SUM(value 1 through value 2).

  • Standard 
  • Structure 
  • script
  • syntax (CORRECT)

Correct: When describing a SUM function, the syntax is =SUM(value 1 through value 2).

12. VLOOKUP searches for a value in a row in order to return a corresponding piece of information.

  • True
  • False (CORRECT)

Correct: VLOOKUP searches for a value in a column in order to return a corresponding piece of information.

13.  A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible?

  • Merge the data
  • Use a data visualization
  • Map the data (CORRECT)
  • Apply a data structure

Correct: Data analysts use data mapping to note differences in data sources in order to ensure the data is compatible.

14. A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use?

  • MATCH
  • FIND
  • VALUE
  • VLOOKUP (CORRECT)

Correct: VLOOKUP is used to search for a certain value in a column, then return a corresponding piece of information.

15.  Fill in the blank: Data mapping is the process of _____ fields from one data source to another.

  • extracting
  • linking
  • matching (CORRECT)
  • merging

Correct: Data mapping is the process of matching fields from one data source to another.

16.  An analyst is working on a project involving customers from Bogota, Colombia. They receive a spreadsheet with 5,000 rows of customer information. What function can they use to confirm that the column for City contains the word Bogota exactly 5,000 times? 

  • COUNT
  • SUMIF
  • SUM
  • COUNTIF (CORRECT)

Correct: They can use COUNTIF, which is a function that returns the number of cells that match a specified value.

17. Fill in the blank: Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition. 

  • queries
  • charts
  • filters
  • cells (CORRECT)

Correct: Conditional formatting is a spreadsheet tool that changes how cells appear when values meet a specific condition.

18. A data analyst suspects that there are many blank cells in their spreadsheet corresponding to missing information. What spreadsheet tool can they use to identify only those cells containing the null values?

  • Conditional ranking
  • Conditional formatting (CORRECT)
  • Cell filtering
  • Cell querying

19. A data analyst is working on a spreadsheet in which one of the columns is name data. This data is formatted as lastname, firstname. The analyst chooses to divide this data into two new columns, one containing the firstname data and the other containing the lastname data. What spreadsheet tool would they use to do this?

  • The MID function
  • Substring formatting
  • The SPLIT function (CORRECT)
  • Conditional formatting

20. A data analyst is using a function in a spreadsheet. When they input the function, they follow a predetermined structure that includes all required information for the function and its proper placement. What aspect of a function does this describe?

  • The specified value of the function
  • The length of the function
  • The number of characters in the function
  • The syntax of the function (CORRECT)

21. As part of the data-cleaning process, a data analyst creates a rule to highlight any empty cells in a bright blue color. This is an example of data visualization.

  • True
  • False (CORRECT)

22. A data analyst is working on a spreadsheet in which one of the columns contains name data. This data is formatted as lastname_firstname. The analyst splits this data at the underscore so that each piece—firstname and lastname—are contained in their own columns.

In this context, what is the underscore acting as?

  • MID function
  • Partition
  • Delimiter (CORRECT)
  • Substring

23. In a spreadsheet, what is the correct function for extracting the first two characters of the string located in cell A7?

  • =LEFT(A7,2) (CORRECT)
  • =RIGHT(2,A7)
  • =RIGHT(A7,2)
  • =LEFT(2,A7)

24. A data analyst in a human resources department is working with the following selection of a spreadsheet:

N/AABCD
1Year HiredLast 4 of SS#DepartmentEmployee ID
220191192Marketing
320142683Operations
420201939Strategy
520093208Graphics

They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 26832014 for the employee in row 3?

  • =CONCATENATE(B3,A3) (CORRECT)
  • =CONCATENATE(B3+A3)
  • =CONCATENATE(A3!B3)
  • =CONCATENATE(A3+B3)

25. An analyst is cleaning a new dataset. They want to make sure the data contained from cell C4 through cell C350 contains only numbers below 40. Choose the statements that include the correct syntax for this COUNTIF function. Select all that apply.

  • =COUNTIF(C4:C350, <=40)
  • =COUNTIF(C4:C350, >40)
  • =COUNTIF(C4:C350,”<40″) (CORRECT)
  • =COUNTIF(C4:C350,”>=40″) (CORRECT)

26. Before analyzing a dataset, an analyst maps the data. What is the reason for doing this?

  • The dataset has no visualizations.
  • The analyst thinks the dataset might have some null values.
  • The dataset contains data from different sources. (CORRECT)
  • The analyst wants to know what attributes the data has.

27. Fill in the blank: In order to make your spreadsheet easier to analyze, you choose to alter the way cells appear if their values meet certain conditions. The spreadsheet tool that you use to do this is called _____.

  • conditional ranking
  • cell querying
  • cell filtering
  • conditional formatting (CORRECT)

28. Fill in the blank: A _____ is a specified text that the SPLIT function uses to determine where a text string is to be divided.

  • partition
  • unit
  • substring
  • delimiter (CORRECT)

29. An analyst is working on a project involving customers from Bogota, Colombia. They receive a spreadsheet with 5,000 rows of customer information. What function can they use to confirm that the column for City contains the word Bogota exactly 5,000 times?

  • SUMIF
  • COUNTIF (CORRECT)
  • COUNT
  • SUM

30. Fill in the blank: The function _____ is used to return information in a column that contains a specified value.

  • MATCH
  • FIND
  • VLOOKUP (CORRECT)
  • VALUE

31. An analyst is cleaning a new dataset. They want to make sure the data contained from cell B2 through cell B100 does not contain a number smaller than 10. Which COUNTIF function syntax can be used to answer this question?

  • =COUNTIF(B2:B100,”<9″)
  • =COUNTIF(B2:B100,”>=10”)
  • =COUNTIF(B2:B100,>50)
  • =COUNTIF(B2:B200, ”<=50”) (CORRECT)

32. A data analyst is using a function in a spreadsheet. For the function to work correctly, they follow the function’s syntax. What does this entail?

  • It is how the function can be used in a program.
  • It is the purpose of the function and its use.
  • It is the function’s name and placement.
  • It is the function’s required information and its proper placement. (CORRECT)

33.  A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible prior to analyzing the data?

  • Map the data (CORRECT)
  • Use a data visualization
  • Apply a data structure
  • Spot check for null values

Sparkling-Clean Data CONCLUSION

Spreadsheets are powerful tools for data analysts because they can be used to clean data. In this section of the course, you learned about the difference between clean and dirty data. You also explored different data cleaning techniques using spreadsheets and other tools.

By understanding how to clean data, you’ll be able to make more accurate analyses in your work. If you want to learn more about data analytics, join the learning experience in Coursera. Thanks for taking this course!