What is one potential problem associated with data manipulation

Course 4 – Process Data from Dirty to Clean Quiz Answers

Week 1: The Importance of Integrity

GOOGLE DATA ANALYTICS PROFESSIONAL CERTIFICATION

Complete Study Guide

The Importance of Integrity INTRODUCTION

When it comes to making decisions, data integrity is key. Data integrity refers to the accuracy and consistency of data over its entire lifecycle, in this post we are going to look into what is one potential problem associated with data manipulation that analysts must be aware of when analyzing data. Also, all stakeholders need to be sure that it has been collected, stored, and processed correctly in order for any insights derived from it to be trusted. That’s why Coursera’s Google Data Analytics Professional Certificate program devotes a module to help you understand the importance of data integrity and how to maintain it.

You can expect to learn about techniques used by analysts when deciding which data should be collected for analysis—as well as structured and unstructured data, different types of data formats, and more.

Learning Objectives

  • Describe statistical measures associated with data integrity including statistical power, hypothesis testing, and margin of error
  • Describe strategies that can be used to address insufficient data
  • Discuss the importance of sample size with reference to sample bias and random samples
  • Describe the relationship between data and related business objectives
  • Define data integrity with reference to types and risks
  • Discuss the importance of pre-cleaning activities

Test your knowledge on data integrity and analytics objectives

1. Which of the following principles are key elements of data integrity? Select all that apply.

  • Accuracy (Correct)
  • Consistency (Correct)
  • Trustworthiness (Correct)
  • Selectivity

Correct: Data integrity is the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.

2. Which process do data analysts use to make data more organized and easier to read?

  • Data uniformity
  • Data manipulation (Correct)
  • Data transfer
  • Data replication

Correct: To make data more organized and easier to read, data analysts use data manipulation.

3. Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity?

  • Change all of the dates to the same format (Correct)
  • Remove data in an unfamiliar date format
  • Organize the data by country
  • Leave the dates in their current formats

Correct: Changing all of the dates to the same format would improve the data integrity.

4. Which of the following processes helps ensure a close alignment of data and business objectives?

  • Maintain data integrity (Correct)
  • Completing data replication
  • Having data update automatically during analysis
  • Transferring data multiple times

Correct: Maintaining data integrity helps ensure a close alignment of data and business objectives because the data is likely to be accurate, complete, consistent, and trustworthy.

5. When gathering data through a survey, companies can save money by surveying 100% of a population.

  • False (Correct)
  • True

Correct: Using a 100% of a population is ideal, but it can be very expensive to gather data from an entire population.

Test your knowledge on insufficient data

1. What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply.

  • Gather related data on a small scale and request additional time to find more complete data. (Correct)
  • Continue with the analysis using data from less reliable sources.
  • Create and use hypothetical data that aligns with analysis predictions.
  • Perform the analysis by finding and using proxy data from other datasets. (Correct)

Correct: If an analyst does not have the data needed to meet a business objective, they should gather related data on a small scale and request additional time. Then, they can find more complete data or perform the analysis by finding and using proxy data from other datasets.

2. Which of the following are limitations that might lead to insufficient data? Select all that apply.

  • Data that updates continually (Correct)
  • Duplicate data
  • Outdated data (Correct)
  • Data from a single source (Correct)

Correct: Limitations that might lead to insufficient data include data that updates continually, outdated data, and data from a single source.

3. A data analyst wants to find out how many people in Utah have swimming pools. It’s unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept?

  • Confidence level
  • Margin of error
  • Sample (Correct)
  • Statistical significance

Correct: This describes a sample, which is a part of a population that is representative of the whole.

Test your knowledge on testing your data

1. A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant?

  • Results that are real and not caused by random chance (Correct)
  • Results that are unlikely to occur again
  • Results that are inaccurate and should be ignored
  • Results that are hypothetical and in need of more testing

Correct: In order for an experiment to be statistically significant, the results should be real and not caused by random chance.

2. In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?

  • The most valuable members of the population
  • The trends from other customer surveys
  • The predictions of stakeholders
  • The entire population (Correct)

Correct: In order to have a high confidence level in a customer survey, the sample size should accurately reflect the entire population.

3. A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%.

  • True
  • False (Correct)

Correct: The confidence level percentage and margin of error percentage do not have to add up to 100%. They are independent of each other.

Test your knowledge on margin of error

1. Fill in the blank: Margin of error is the _____ amount that the sample results are expected to differ from those of the actual population.

  • Maximum (Correct)
  • minimum
  • median
  • average

Correct: Margin of error is the maximum amount that the sample results are expected to differ from those of the actual population.

2. In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population’s true response?

  • Between 75% and 80%
  • Between 73% and 78%
  • Between 70% and 80% (Correct)
  • Between 70% and 75%

Correct: Based on the margin of error, between 70% and 80% accurately reflects the population’s true response.

GOOGLE DATA ANALYTICS COURSERA ANSWERS AND STUDY GUIDE

Liking our content? Then don’t forget to add us to your bookmarks so you can find us easily!

Weekly Breakdown | Google Study Guides | Back to Top

Process Data from Dirty to Clean Weekly Challenge 1

1. Which of the following conditions are necessary to ensure data integrity? Select all that apply.

  • Accuracy (Correct)
  • Statistical power
  • Completeness (Correct)
  • Privacy

Correct: Accuracy and completeness are necessary to ensure data integrity.

2. What is one potential problem associated with data manipulation that analysts must be aware of?

  • Data manipulation can help organize a dataset.
  • Data manipulation can introduce errors. (Correct)
  • Data manipulation can make a dataset easier to read.
  • Data manipulation can separate a dataset among different locations.

Correct: Data manipulation is the process of changing data to make it more organized and easier to read. However, it can sometimes introduce errors.

3. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions can the analyst use this dataset to address? Select all that apply.

  • What was the reason for the population increase in a certain country?
  • What was the effect of migration on the population of a certain country?
  • What was the difference in population between two specific countries in 2018? (Correct)
  • What was the average population of a certain country from 2015 through 2020? (Correct)

Correct: The analyst could use the dataset to find the average population of a certain country from 2015 through 2020 and the difference in population between two specific countries in 2018.

4. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

The analyst notices a limitation with the data in rows 8 and 9. What is the limitation?

  • Row 8 and row 9 show the wrong currency.
  • Row 9 is a duplicate of row 8. (Correct)
  • Row 9 needs more data.
  • Row 8 is not in the correct format.

Correct: Row 9 is a duplicate of row 8. Duplicate data is a limitation because it will lead to faulty analysis.

5. A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe?

  • Data that keeps updating
  • Data from only one source
  • Data that’s geographically limited (Correct)
  • Data that’s outdated

Correct: This example describes data that is insufficient because it’s geographically limited. If the analytics project has a global focus, the dataset should also be global.

6. In the data analysis process, how does a sample relate to a population?

  • A sample is a part of a population that is representative of the population. (Correct)
  • A sample is an ideal example taken from a population.
  • A sample is a duplicate selection of data that is taken from the population.
  • A sample is an average of all the data that represents the population.

Correct: A sample relates to a population by representing a population at a smaller scale.

7. A restaurant gathers data about a new dish by providing free samples to parties of six or more diners. What does this scenario describe?

  • Unbiased sampling
  • Random sampling
  • Geographically limited sampling
  • Sampling bias (Correct)

Correct: This scenario describes sampling bias because parties of six or more are not representative of the population as a whole.

8. Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.

  • Sampling bias (Correct)
  • Data integrity
  • Insufficient data (Correct)
  • Data visualization

Correct: Insufficient data and sampling bias can prevent alignment.

9. Fill in the blank: Data _____ refers to the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.

  • Integrity (CORRECT)
  • analysis
  • sampling
  • replication

Correct: Data integrity is the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.

10. A healthcare company keeps copies of their data at several locations across the country. The data becomes compromised because each location creates a copy of the original at different times of day. Which of the following processes caused the compromise?

  • Data manipulation
  • Data transfer
  • Data gathering
  • Data replication (CORRECT)

Correct: Data replication caused the compromise. Data replication is the process of storing data in multiple locations. If not done properly, replication can compromise integrity and cause inconsistencies.

11. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst would be able to determine the reasons behind a certain country’s population increase from 2016 to 2017.

  • True
  • False (CORRECT)

Correct: Based on the available data, the analyst would need more data to determine the reasons behind the population increase.

12. A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe?

  • Data that keeps updating (CORRECT)
  • Outdated data
  • Geographically limited data
  • Data from only one source

Correct: This example describes insufficient data that keeps updating. If a dataset keeps updating, that means the data is still incoming and might be incomplete. 

13. Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.

  • the population as a whole (CORRECT)
  • the population most affected by the data 
  • a dataset about the population
  • a subset of the population

Correct: Sampling bias in data collection happens when a sample isn’t representative of the population as a whole. 

14. Sometimes during analysis, an analyst discovers that it’s necessary to adjust the business objective. When this happens, the analyst should take the initiative to do so without involving others in order to be respectful of their time.

  • True
  • False (CORRECT)

Correct: If a data analyst believes the business objective should be adjusted, it’s important to first have a discussion with stakeholders.

15. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions would the analyst need more data to address? 

  • What was the population of a certain country in 2020?
  • Which country had the greatest population in 2015?
  • Which country had the smallest population in 2017?
  • What was the reason for the population increase in a certain country? (CORRECT)

Correct: The analyst would need more data to identify the reason for a population increase.

16. A restaurant wants to gather data about a new dish by giving out free samples and asking for feedback. Who should the restaurant give samples to?

  • All diners (CORRECT)
  • 80% of diners 
  • Diners who spend the most money on their meal 
  • Diners who are willing to pay for the samples

Correct: The restaurant should give samples to all diners.

17. A data analyst at a software company wants to learn more about industry competitors. Because the software industry has more mergers than any other field, the companies and their products are constantly evolving. The analyst has a dataset from three years ago, and they notice that many of the companies and products in the dataset have changed. What makes the analyst decide that the data is insufficient, so they should generate fresh data instead?

  • It is geographically limited data
  • It is data from only one source
  • It is outdated data (CORRECT)
  • It is data that keeps updating

Correct: This example describes outdated data, which is insufficient. If a dataset is outdated, that means the data is old and probably no longer relevant.

18. Fill in the blank: If a data analyst is using data that has been _____, the data will lack integrity and the analysis will be faulty.

  • Compromised (CORRECT)
  • clean
  • wide
  • public

Correct: If a data analyst is using data that has been compromised, the data will lack integrity and the analysis will be faulty.

19. A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?

  • Data analysis
  • Data manipulation
  • Data transfer (CORRECT)
  • Data gathering

Correct: Data transfer caused the compromise. When a data transfer is interrupted, it can result in an incomplete dataset.

20. Fill in the blank: As a data analyst, you need to verify that your data is _____ to ensure your analysis and conclusions are accurate.

  • complete and valid (CORRECT)
  • manipulated and replicated
  • private and valid
  • manipulated and valid

21. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

Which of the following has duplicate data?

  • Data for Symteco on 2/21/2014
  • Data for Symteco on 5/20/2014
  • Data for Valando on 1/1/2014
  • Data for Valando on 2/18/2014 (CORRECT)

22. A clothing manufacturer wants to learn more about why their consumers have purchased the brand’s products. How should this manufacturer conduct their survey?

  • Send the survey to their least frequent customers
  • Send the survey to a representative sample of their customers (CORRECT)
  • Send the survey to customers who have purchased more than one product
  • Send the survey to random people who buy clothes

23. A car dealership gathers data about their entire customer population. They decide to conduct a survey to understand why their customers chose their dealership. They send out an email to all customers who have purchased more than two vehicles in the past five years. What does this scenario describe?

  • Unbiased sampling
  • Random sampling
  • Sampling bias (CORRECT)
  • Geographically limited sampling

24. What can jeopardize data integrity throughout its lifecycle? Select all that apply.

  • Insufficient data
  • Malware (CORRECT)
  • System failures (CORRECT)
  • Human error (CORRECT)

25. A data analyst needs to migrate data from a server located at their company’s headquarters to a remote site. This can lead to what type of data integrity issue? Select all that apply.

  • Data cleaning
  • Data manipulation
  • Data replication (CORRECT)
  • Data transfer (CORRECT)

26. A data analyst, working for a publishing company, gathers a dataset which includes all books sold in the United Kingdom over the last three years. However, they decide to generate new data that represents global book sales. What type of insufficient data does this scenario describe?

  • Data from only one source
  • Data that is outdated
  • Data that is geographically limited (CORRECT)
  • Data that keeps updating

27. A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?

  • A sample of all electric car owners (CORRECT)
  • A sample of car owners who have owned more than one electric car
  • The entire population of electric car owners
  • A sample of car owners who most recently bought an electric car

28. A restaurant gathers data about a new dish by providing free samples to parties of six or more diners. What does this scenario describe?

  • Unbiased sampling
  • Random sampling
  • Sampling bias (CORRECT)
  • Geographically limited sampling

29. What best describes a sample size?

  • A subset of the population excluding outliers
  • A subset of the population between the 25th and 50th percentile
  • A random subset of the population
  • A subset that is representative of the population as a whole (CORRECT)

30. Fill in the blank: In order to have a strong and thorough analysis, a data analyst must verify _____.

  • data manipulation
  • data engineering
  • data integrity (CORRECT)
  • data replication

31. A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?

  • Data gathering
  • Data analysis
  • Data transfer (CORRECT)
  • Data manipulation

32. You are working for a global technology company. You have a dataset with the company’s total cell phone sales by country from 2015 to present. Based on the data you have, what questions are you able to answer?

  • What was the effect on sales when a new phone model was launched?
  • What countries have the most cell phone sales in the past three years? (CORRECT)
  • What was the effect on sales when new phone features were introduced?
  • What are the mean cell phone sales for each country since 2010?

33. A data analyst is working on a project around a national supply chain. They have a dataset with lots of relevant data from about half of the country. However, they decide to generate new data that represents the entire nation. What type of insufficient data does this scenario describe?

  • Geographically limited data (CORRECT)
  • Data that keeps updating
  • Outdated data
  • Data from only one source

34. A company has multiple retail chain stores. Each store’s database is located onsite and used for various purposes. Which of the following processes could compromise data integrity?

  • Data transfer
  • Data gathering
  • Data replication (CORRECT)
  • Data cleaning

35. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

  • Identifying the best paying client between January and November of 2014
  • Identifying the most profitable clients between January and November of 2014 (CORRECT)
  • Identifying the worst paying client between March and December of 2014 (CORRECT)
  • Identifying the least profitable clients between January and November of 2014 (CORRECT)

36. A high school principal is estimating the total number of students that will attend an upcoming event. She assumes that the older students are unlikely to attend and decides to only survey the first-year students. What issue will the principal face when calculating her estimation? 

  • The sample is too small.
  • The sample should be the older students. 
  • The sample exhibits sampling randomness.
  • The sample exhibits sampling bias. (CORRECT)

37. Fill in the blank: _____ is the process of changing data to make it more organized and easier to read.

  • Data replication
  • Data transfer
  • Data gathering
  • Data manipulation (CORRECT)

38. A company is trying to learn more about their customer base. They would like to conduct a survey to understand why their customers chose their brand. How should the company survey its customers?

  • Conduct a survey with customers who have purchased more than five products
  • Conduct a survey with a representative sample of their customer population (CORRECT)
  • Conduct a survey of customers who purchased a different brand
  • Conduct a survey of customers that live in high-income areas

39. A candy manufacturer finds an even distribution of sales across all age ranges of customers who purchase their products. The manufacturer decides to conduct a survey to learn more about its customer base. Due to age requirements, they can only send the survey to customers who are 21 years or older. This scenario can be described as what?

  • Sampling bias (CORRECT)
  • Down sampling bias
  • Unbiased sampling
  • Upsampling bias

40. A data analyst retrieves a sample of their data that is roughly representative of the population as a whole. They realize that there will be some error in their sample results because they didn’t sample the entire population. What is this error called?

  • Margin of error
  • Sampling error
  • Mean squared error
  • Population error (CORRECT)

The Importance of Integrity conclusion

Data integrity is essential to successful decision-making. This part of the course has highlighted why data integrity is so essential and how data is generated. You have also learned about different techniques analysts use to decide what data to collect for analysis, as well as structured versus unstructured data.

Finally, you discovered different data types and formats. Join the learning experience in Coursera to dive deeper into these topics and improve your understanding of data management!