COURSE 3: GO BEYOND THE NUMBERS: TRANSLATE DATA INTO INSIGHTS

Module 2: Explore Raw Data

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Explore Raw Data

Finding stories in data using EDA is all about organizing and interpreting raw data. Python can help you do this quickly and effectively. You’ll learn how to use Python to perform the EDA practices of discovering and sculpting.

Learning Objectives

  • Identify ethical issues that may come up during the data “discovering” practice of EDA
  • Use Python to merge or join data based on defined criteria
  • Use Python to sort and/or filter data
  • Use relevant Python libraries for cleaning raw data
  • Recognize opportunities for creating hypotheses based on raw data
  • Recognize when and how to communicate status updates and questions to key stakeholders
  • Apply Python tools to examine raw data structure and format.
  • Use the PACE workflow to understand whether given data is adequate and applicable to a data science project
  • Differentiate between the common formats of raw data sources (json, tabular, etc.) and data types

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DISCOVERING IS THE BEGINNING OF AN INVESTIGATION

1. Fill in the blank: Tabular, XML, CSV, and JSON files are all types of _____.

  • data formats (CORRECT)
  • data types
  • spreadsheets
  • Python functions

Correct: Tabular, XML, CSV, and JSON files are all types of data formats.

2. It is a data professional’s responsibility to understand data sources because the data’s origin affects its reliability.

  • True (CORRECT)
  • False

Correct: It is a data professional’s responsibility to understand data sources because the data’s origin affects its reliability. Understanding data sources involves determining how and when to contact the people who either generated the data or are in charge of delivering it in order to inform data discovery.

3. Which Python method returns the total number of entries and the data types of individual data entries in a dataset?

  • Return()
  • Total()
  • Number()
  • Info() (CORRECT)

Correct: Info() returns the total number of entries and the data types of individual data entries in a dataset.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: UNDERSTAND DATA FORMAT

1. Which of the following statements will convert the ‘time’ column into a datetime data type?

  • [‘time’] = pd.datetime(df[‘time’])
  • df[‘time’] = pd.to_datetime(‘time’)
  • df[‘time’] = pd.to_time(df[‘datetime’])
  • df[‘time’] = pd.to_datetime(df[‘time’]) (CORRECT)

Correct: The statement df[‘time’] = pd.to_datetime(df[‘time’]) will convert the ‘time’ column into a datetime data type.

2. What Python method formats data into a new string representing date and time using a date, time, or datetime object?

  • Strftime() (CORRECT)
  • Head()
  • Fig.show()
  • Div()

Correct: Strftime() formats data into a new string representing date and time using a date, time, or datetime object.

3. A data professional is creating a bar chart in Python. To label the y-axis Sales to Date, a data professional could use the following statements: plt.ylabel(‘Sales to Date’).

  • True (CORRECT)
  • False

Correct: A data professional is creating a bar chart in Python. To label the y-axis Sales to Date, a data professional could use the following statements: plt.ylabel(‘Sales to Date’).

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: CREATE STRUCTURE FROM RAW DATA

1. Fill in the blank: Grouping is a structuring method that enables data professionals to _____ individual observations of a variable into different categories or classes.

  • Classify
  • Rank
  • Aggregate (CORRECT)
  • disperse

Correct: Grouping is a structuring method that enables data professionals to aggregate individual observations of a variable into different categories or classes.

2. Which of the following Python statements will create a list called grade_order that starts with Preschool?

  • order_grade = [‘Preschool’, ‘Kindergarten’, ‘Elementary School’, ‘Middle School’, ‘High School’]
  • order = [‘Preschool_Grade’, ‘Kindergarten_Grade’, ‘Elementary School_Grade’, ‘Middle School_Grade’, ‘High School_Grade’]
  • grade_order (‘Preschool’, ‘Kindergarten’, ‘Elementary School’, ‘Middle School’, ‘High School’)
  • grade_order = [‘Preschool’, ‘Kindergarten’, ‘Elementary School’, ‘Middle School’, ‘High School’] (CORRECT)

Correct: The statement grade_order = [‘Preschool’, ‘Kindergarten’, ‘Elementary School’, ‘Middle School’, ‘High School’] will set the grade order to start with Preschool.

3. A data professional can use the concat function to join two or more dataframes.

  • True (CORRECT)
  • False

Correct: A data professional can use the concat function to join two or more dataframes. In pandas, it’s written as pd.concat().

QUIZ: MODULE 2 CHALLENGE

1. What are some strategies data professionals use to understand the source of a dataset? Select all that apply.

  • Ensure data supports the data professional’s hypothesis.
  • Investigate whether the data originator has any financial stake in the dataset. (CORRECT)
  • Request relevant information from the data owners. (CORRECT)
  • Confirm the data owners have experience collecting data. (CORRECT)

Correct!

2. Fill in the blank: A data storage file saved in a JavaScript format, also known as a _____ file, may contain nested objects.

  • CSV
  • spreadsheet
  • J-SON (CORRECT)
  • JPG

Correct!

3. What type of data is gathered outside of an organization, but directly from the original source?

  • First-party
  • Fourth-party
  • Third-party
  • Second-party (CORRECT)

Correct!

4. Which of the following statements correctly uses the head() function to return the first 25 rows of a dataset?

  • df.head(25) (CORRECT)
  • df.head(rows=25)
  • df.head(25.df)
  • head=25

Correct!

5. Which of the following statements will assign the name Chicago Neighborhoods to a bar graph in Python?

  • plt.title(“Chicago Neighborhoods”) (CORRECT)
  • plt.show(“Chicago Neighborhoods”)
  • plt.xlabel(“Chicago Neighborhoods”)
  • plt.name(“Chicago Neighborhoods”)

Correct!

6. A data professional types the following partial code:

locationmode ='Indonesia'
fig.update_layout(title_text = 'Languages', 
geo_scope='Indonesia', )
fig.show()

Which element of the code is used to render a graphic of the plot?

  • fig.show() (CORRECT)
  • fig.update_layout
  • geo_scope=
  • title_text =

Correct!

7. Which structuring method aggregates individual observations of a variable into buckets?

  • Filtering
  • Slicing
  • Grouping (CORRECT)
  • Merging

Correct!

8. Fill in the blank: A box plot is a data visualization that depicts the locality, skew, and _____ of groups of values within quartiles.

  • temperature
  • spread (CORRECT)
  • height
  • area

Correct!

9. What are some of the benefits of J-SON files for data professionals? Select all that apply.

  • Eliminate nested objects within the files
  • Readability in almost any programming language (CORRECT)
  • Easily distinguish between strings and numbers (CORRECT)
  • Small message size (CORRECT)

Correct!

10. Which of the following statements will assign the name Salzburg Restaurants to a bar graph in Python?

  • plt.title(“Salzburg Restaurants”) (CORRECT)
  • plt.name(“Salzburg Restaurants”)
  • plt.show(“Salzburg Restaurants”)
  • plt.xlabel(“Salzburg Restaurants”)

Correct!

11. Which Python function is used to render a graphic of a plot called graph?

  • show.pt(graph)
  • graph.display()
  • graph.show() (CORRECT)
  • plot.graph()

Correct!

12. What type of data is gathered outside of an organization and aggregated?

  • Fourth-party
  • First-party
  • Third-party (CORRECT)
  • Second-party

Correct!

13. Which of the following statements correctly uses the head() function to return the first 5 rows of a dataset?

  • df.head(5) (CORRECT)
  • df.head(5.df)
  • head=5
  • df.head(rows=5)

Correct!

14. Fill in the blank: The Python function fig.show() is used to render a _____ of a plot.

  • Template
  • Dashboard
  • mirror image
  • graphic (CORRECT)

Correct!

15. Which structuring method combines two different data frames along a specified starting column?

  • Filtering
  • Sorting
  • Merging (CORRECT)
  • Grouping

Correct!

16. Fill in the blank: A box plot is a data visualization that depicts the spread, skew, and _____ of groups of values within quartiles.

  • speed
  • intensity
  • locality (CORRECT)
  • timing

Correct!

17. What is the data storage file format for JavaScript?

  • spreadsheet
  • XML
  • CSV
  • J-SON (CORRECT)

Correct!

18. Which structuring method selects a smaller part of a dataset based on specified parameters, then uses it for analysis?

  • Organizing
  • Sorting
  • Grouping
  • Filtering (CORRECT)

Correct!

19. Fill in the blank: A _____ is a data visualization that depicts the locality, spread, and skew of groups of values within quartiles.

  • Gantt chart
  • box plot (CORRECT)
  • density map
  • scatter plot

Correct!

20. What are some strategies data professionals use to understand the source of a dataset? Select all that apply.

  • Verify the data source to ensure it will align with stakeholder beliefs.
  • Give extra weight to duplicate records to highlight the multiple responses.
  • If questions arise during discovery, contact the data engineers for information. (CORRECT)
  • Confirm the database owners have experience storing data. (CORRECT)

Correct!

21. What type of data is gathered from inside a company’s own organization?

  • Third-party
  • First-party (CORRECT)
  • Second-party
  • Fourth-party

Correct!

22. Which of the following statements will assign the name Kuwait Museums to a bar graph in Python?

  • plt.xlabel(“Kuwait Museums”)
  • plt.name(“Kuwait Museums”)
  • plt.title(“Kuwait Museums”) (CORRECT)
  • plt.show(“Kuwait Museums”)

Correct!

23. What are some strategies data professionals use to understand the source of a dataset? Select all that apply.

  • Reduce outliers by ensuring data comes from a small sample.
  • Request relevant information from the team members who supplied the data. (CORRECT)
  • Determine where the data originally came from. (CORRECT)
  • Confirm the original data owner has no financial stake in the data’s output. (CORRECT)

Correct!

24. Which of the following statements correctly uses the head() function to return the first 10 rows of a dataset?

  • head=10
  • df.head(10.df)
  • df.head(10) (CORRECT)
  • df.head(rows=10)

Correct!

25. Fill in the blank: _____ is data gathered from inside your own organization.

  • First-party (CORRECT)
  • Third-party
  • Fourth-party
  • Second-party

Correct: First-party data is data gathered from inside your own organization.

26. Why does a data professional use the Python methods describe(), sample(), size, and shape?

  • To save a dataset
  • To share a dataset
  • To transfer a dataset
  • To learn about a dataset (CORRECT)

Correct: A data professional uses the Python methods describe(), sample(), size, and shape to learn about a dataset.

27. In the statement df[‘date’].dt.strftime(‘%Y-W%V’), which element states that the year should be included in the new column format?

  • Hyphen
  • Parentheses
  • %Y (CORRECT)
  • Square brackets

Correct: In the statement df[‘date’].dt.strftime(‘%Y-W%V’), the element %Y states that the year should be included in the new column format.

28. What structuring method enables data professionals to divide information into smaller parts in order to facilitate efficient examination and analysis from different viewpoints?

  • Slicing (CORRECT)
  • Grouping
  • Filtering
  • Extracting

Correct: Slicing enables data professionals to break down information into smaller parts.

29. Fill in the blank: A box plot is a data visualization that depicts the locality, spread, and _____ of groups of values within quartiles.

  • variety
  • flow
  • skew (CORRECT)
  • meaning

Correct: A box plot is a data visualization that depicts the locality, spread, and skew of groups of values within quartiles. Box plots provide information on the variability and dispersion of data by depicting how the values in the data are spread out.