
Course 4 – Process Data from Dirty to Clean Quiz Answers
Week 1: The Importance of Integrity
GOOGLE DATA ANALYTICS PROFESSIONAL CERTIFICATION
Complete Study Guide
The Importance of Integrity INTRODUCTION
When it comes to making decisions, data integrity is key. Data integrity refers to the accuracy and consistency of data over its entire lifecycle, in this post we are going to look into what is one potential problem associated with data manipulation that analysts must be aware of when analyzing data. Also, all stakeholders need to be sure that it has been collected, stored, and processed correctly in order for any insights derived from it to be trusted. That’s why Coursera’s Google Data Analytics Professional Certificate program devotes a module to help you understand the importance of data integrity and how to maintain it.
You can expect to learn about techniques used by analysts when deciding which data should be collected for analysis—as well as structured and unstructured data, different types of data formats, and more.
Learning Objectives
- Describe statistical measures associated with data integrity including statistical power, hypothesis testing, and margin of error
- Describe strategies that can be used to address insufficient data
- Discuss the importance of sample size with reference to sample bias and random samples
- Describe the relationship between data and related business objectives
- Define data integrity with reference to types and risks
- Discuss the importance of pre-cleaning activities
Test your knowledge on data integrity and analytics objectives
1. Which of the following principles are key elements of data integrity? Select all that apply.
- Accuracy (Correct)
- Consistency (Correct)
- Trustworthiness (Correct)
- Selectivity
Correct: Data integrity is the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.
2. Which process do data analysts use to make data more organized and easier to read?
- Data uniformity
- Data manipulation (Correct)
- Data transfer
- Data replication
Correct: To make data more organized and easier to read, data analysts use data manipulation.
3. Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity?
- Change all of the dates to the same format (Correct)
- Remove data in an unfamiliar date format
- Organize the data by country
- Leave the dates in their current formats
Correct: Changing all of the dates to the same format would improve the data integrity.
4. Which of the following processes helps ensure a close alignment of data and business objectives?
- Maintain data integrity (Correct)
- Completing data replication
- Having data update automatically during analysis
- Transferring data multiple times
Correct: Maintaining data integrity helps ensure a close alignment of data and business objectives because the data is likely to be accurate, complete, consistent, and trustworthy.
5. When gathering data through a survey, companies can save money by surveying 100% of a population.
- False (Correct)
- True
Correct: Using a 100% of a population is ideal, but it can be very expensive to gather data from an entire population.
Test your knowledge on insufficient data
1. What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply.
- Gather related data on a small scale and request additional time to find more complete data. (Correct)
- Continue with the analysis using data from less reliable sources.
- Create and use hypothetical data that aligns with analysis predictions.
- Perform the analysis by finding and using proxy data from other datasets. (Correct)
Correct: If an analyst does not have the data needed to meet a business objective, they should gather related data on a small scale and request additional time. Then, they can find more complete data or perform the analysis by finding and using proxy data from other datasets.
2. Which of the following are limitations that might lead to insufficient data? Select all that apply.
- Data that updates continually (Correct)
- Duplicate data
- Outdated data (Correct)
- Data from a single source (Correct)
Correct: Limitations that might lead to insufficient data include data that updates continually, outdated data, and data from a single source.
3. A data analyst wants to find out how many people in Utah have swimming pools. It’s unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept?
- Confidence level
- Margin of error
- Sample (Correct)
- Statistical significance
Correct: This describes a sample, which is a part of a population that is representative of the whole.
Test your knowledge on testing your data
1. A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant?
- Results that are real and not caused by random chance (Correct)
- Results that are unlikely to occur again
- Results that are inaccurate and should be ignored
- Results that are hypothetical and in need of more testing
Correct: In order for an experiment to be statistically significant, the results should be real and not caused by random chance.
2. In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?
- The most valuable members of the population
- The trends from other customer surveys
- The predictions of stakeholders
- The entire population (Correct)
Correct: In order to have a high confidence level in a customer survey, the sample size should accurately reflect the entire population.
3. A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%.
- True
- False (Correct)
Correct: The confidence level percentage and margin of error percentage do not have to add up to 100%. They are independent of each other.
Test your knowledge on margin of error
1. Fill in the blank: Margin of error is the _____ amount that the sample results are expected to differ from those of the actual population.
- Maximum (Correct)
- minimum
- median
- average
Correct: Margin of error is the maximum amount that the sample results are expected to differ from those of the actual population.
2. In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population’s true response?
- Between 75% and 80%
- Between 73% and 78%
- Between 70% and 80% (Correct)
- Between 70% and 75%
Correct: Based on the margin of error, between 70% and 80% accurately reflects the population’s true response.
GOOGLE DATA ANALYTICS COURSERA ANSWERS AND STUDY GUIDE
Liking our content? Then don’t forget to add us to your bookmarks so you can find us easily!
Weekly Breakdown | Google Study Guides | Back to Top
Process Data from Dirty to Clean Weekly Challenge 1
1. Which of the following conditions are necessary to ensure data integrity? Select all that apply.
- Accuracy (Correct)
- Statistical power
- Completeness (Correct)
- Privacy
Correct: Accuracy and completeness are necessary to ensure data integrity.
2. What is one potential problem associated with data manipulation that analysts must be aware of?
- Data manipulation can help organize a dataset.
- Data manipulation can introduce errors. (Correct)
- Data manipulation can make a dataset easier to read.
- Data manipulation can separate a dataset among different locations.
Correct: Data manipulation is the process of changing data to make it more organized and easier to read. However, it can sometimes introduce errors.
3. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions can the analyst use this dataset to address? Select all that apply.
- What was the reason for the population increase in a certain country?
- What was the effect of migration on the population of a certain country?
- What was the difference in population between two specific countries in 2018? (Correct)
- What was the average population of a certain country from 2015 through 2020? (Correct)
Correct: The analyst could use the dataset to find the average population of a certain country from 2015 through 2020 and the difference in population between two specific countries in 2018.
4. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”
The analyst notices a limitation with the data in rows 8 and 9. What is the limitation?
- Row 8 and row 9 show the wrong currency.
- Row 9 is a duplicate of row 8. (Correct)
- Row 9 needs more data.
- Row 8 is not in the correct format.
Correct: Row 9 is a duplicate of row 8. Duplicate data is a limitation because it will lead to faulty analysis.
5. A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe?
- Data that keeps updating
- Data from only one source
- Data that’s geographically limited (Correct)
- Data that’s outdated
Correct: This example describes data that is insufficient because it’s geographically limited. If the analytics project has a global focus, the dataset should also be global.
6. In the data analysis process, how does a sample relate to a population?
- A sample is a part of a population that is representative of the population. (Correct)
- A sample is an ideal example taken from a population.
- A sample is a duplicate selection of data that is taken from the population.
- A sample is an average of all the data that represents the population.
Correct: A sample relates to a population by representing a population at a smaller scale.
7. A restaurant gathers data about a new dish by providing free samples to parties of six or more diners. What does this scenario describe?
- Unbiased sampling
- Random sampling
- Geographically limited sampling
- Sampling bias (Correct)
Correct: This scenario describes sampling bias because parties of six or more are not representative of the population as a whole.
8. Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.
- Sampling bias (Correct)
- Data integrity
- Insufficient data (Correct)
- Data visualization
Correct: Insufficient data and sampling bias can prevent alignment.
9. Fill in the blank: Data _____ refers to the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.
- Integrity (CORRECT)
- analysis
- sampling
- replication
Correct: Data integrity is the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle.
10. A healthcare company keeps copies of their data at several locations across the country. The data becomes compromised because each location creates a copy of the original at different times of day. Which of the following processes caused the compromise?
- Data manipulation
- Data transfer
- Data gathering
- Data replication (CORRECT)
Correct: Data replication caused the compromise. Data replication is the process of storing data in multiple locations. If not done properly, replication can compromise integrity and cause inconsistencies.
11. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst would be able to determine the reasons behind a certain country’s population increase from 2016 to 2017.
- True
- False (CORRECT)
Correct: Based on the available data, the analyst would need more data to determine the reasons behind the population increase.
12. A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe?
- Data that keeps updating (CORRECT)
- Outdated data
- Geographically limited data
- Data from only one source
Correct: This example describes insufficient data that keeps updating. If a dataset keeps updating, that means the data is still incoming and might be incomplete.
13. Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.
- the population as a whole (CORRECT)
- the population most affected by the data
- a dataset about the population
- a subset of the population
Correct: Sampling bias in data collection happens when a sample isn’t representative of the population as a whole.
14. Sometimes during analysis, an analyst discovers that it’s necessary to adjust the business objective. When this happens, the analyst should take the initiative to do so without involving others in order to be respectful of their time.
- True
- False (CORRECT)
Correct: If a data analyst believes the business objective should be adjusted, it’s important to first have a discussion with stakeholders.
15. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions would the analyst need more data to address?
- What was the population of a certain country in 2020?
- Which country had the greatest population in 2015?
- Which country had the smallest population in 2017?
- What was the reason for the population increase in a certain country? (CORRECT)
Correct: The analyst would need more data to identify the reason for a population increase.
16. A restaurant wants to gather data about a new dish by giving out free samples and asking for feedback. Who should the restaurant give samples to?
- All diners (CORRECT)
- 80% of diners
- Diners who spend the most money on their meal
- Diners who are willing to pay for the samples
Correct: The restaurant should give samples to all diners.
17. A data analyst at a software company wants to learn more about industry competitors. Because the software industry has more mergers than any other field, the companies and their products are constantly evolving. The analyst has a dataset from three years ago, and they notice that many of the companies and products in the dataset have changed. What makes the analyst decide that the data is insufficient, so they should generate fresh data instead?
- It is geographically limited data
- It is data from only one source
- It is outdated data (CORRECT)
- It is data that keeps updating
Correct: This example describes outdated data, which is insufficient. If a dataset is outdated, that means the data is old and probably no longer relevant.
18. Fill in the blank: If a data analyst is using data that has been _____, the data will lack integrity and the analysis will be faulty.
- Compromised (CORRECT)
- clean
- wide
- public
Correct: If a data analyst is using data that has been compromised, the data will lack integrity and the analysis will be faulty.
19. A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?
- Data analysis
- Data manipulation
- Data transfer (CORRECT)
- Data gathering
Correct: Data transfer caused the compromise. When a data transfer is interrupted, it can result in an incomplete dataset.
The Importance of Integrity conclusion
Data integrity is essential to successful decision-making. This part of the course has highlighted why data integrity is so essential and how data is generated. You have also learned about different techniques analysts use to decide what data to collect for analysis, as well as structured versus unstructured data.
Finally, you discovered different data types and formats. Join the learning experience in Coursera to dive deeper into these topics and improve your understanding of data management!
Subscribe to our site
Get new content delivered directly to your inbox.
Quiztudy Top Courses
Popular in Coursera
- Meta Marketing Analytics Professional Certificate.
- Google Digital Marketing & E-commerce Professional Certificate.
- Google UX Design Professional Certificate.
- Meta Social Media Marketing Professional Certificate
- Google Project Management Professional Certificate
- Meta Front-End Developer Professional Certificate
Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!