Data analyst is working with a data frame

Course 7 – Data Analysis with R Programming Quiz Answers

Week 3: Working with Data in R

GOOGLE DATA ANALYTICS PROFESSIONAL CERTIFICATE

Coursera RStudio Answers Study Guide

Working with Data in R INTRODUCTION

Working with data in R is an important step of the data analysis process. In this part of the Google Data Analytics Professional Certification course from Coursera, you’ll learn about how to use functions and other processes for structuring, organizing and cleaning your data using R. You will explore the concept of a Data Frame – an object that stores tabular data – and learn how to work with them in R.

Additionally, you will consider potential sources of bias in datasets, such as selection bias or confounding variables, and understand how R can help detect these issues. Working through this course module gives you the skills needed to effectively manage your data before further analyzing it in more depth with other tools and techniques.

Learning Objectives

  • Discuss how R functions may be used to address issues of bias and relationship between data variables
  • Describe R functions that may be used to clean and organize data
  • Describe functions used to work with data frames including read_csv(), data(), and datapasta()
  • Discuss the difference between tibbles and tribbles
  • Compare and contrast data cleaning with different tools
  • Create and work with data in R

Test your knowledge on r data frames

1. Which of the following are best practices for creating data frames? Select all that apply.

Rating: 5 out of 5.
  • All data stored should be the same type
  • Rows should be named
  • Columns should be named (Correct)
  • Each column should contain the same number of data items (Correct)

Correct: When creating data frames, columns should be named and each column should contain the same number of data items.

2. Why are tibbles a useful variation of data frames?

  • Tibbles make printing easier (Correct)
  • Tibbles make changing the names of variables easier.
  • Tibble can change the data type of inputs
  • Tibbles can create row names

Correct: Tibbles can make printing easier. They also help you avoid overloading your console when working with large datasets. Tibbles are automatically set to only return the first ten rows of a dataset and as many columns as it can fit on the screen.

3. Tidy data is a way of standardizing the organization of data within R.

  • True (Correct)
  • False

Correct: Tidy data refers to the principles that make data structures meaningful and easy to understand. It’s a way of standardizing the organization of data within R.

4. Which R function can be used to make changes to a data frame?

  • str()
  • mutate() (Correct)
  • head()
  • colnames()

Correct: The mutate() function can be used to make changes to a data frame.

Test your knowledge on cleaning data

1. A data analyst is cleaning their data in R. They want to be sure that their column names are unique and consistent to avoid any errors in their analysis. What R function can they use to do this automatically?

  • rename()
  • clean_names() (Correct)
  • select()
  • rename_with()

Correct: The clean_names() function will automatically make sure that column names are unique and consistent.

2. You are working with the penguins dataset. You want to use the arrange() function to sort the data for the column bill_length_mm in ascending order. You write the following code:

penguins %>%

Add a code chunk to sort the column bill_length_mm in ascending order.

Course_7_Week_3
  • 33.1
  • 33.5
  • 34.0
  • 32.1 (Correct)

Correct: You add the code chunk arrange(bill_length_mm) to sort the column bill_length_mm in ascending order. The correct code is penguins %>% arrange(bill_length_mm). Inside the parentheses of the arrange() function is the name of the variable you want to sort. The code returns a tibble that displays the data for bill_length_mm from shortest to longest. The shortest bill length is 32.1mm.

3. A data analyst is working with customer information from their company’s sales data. The first and last names are in separate columns, but they want to create one column with both names instead. Which of the following functions can they use?

  • arrange()
  • unite() (Correct)
  • select()
  • separate()

Correct: The unite() function can be used to combine columns.

test your knowledge on R functions

1. Which of the following functions can a data analyst use to get a statistical summary of their dataset? Select all that apply.

  • mean() (Correct)
  • ggplot2()
  • cor() (Correct)
  • sd() (Correct)

Correct: The sd(), cor(), and mean() functions can provide a statistical summary of the dataset using standard deviation, correlation, and mean.

2. A data analyst inputs the following command:

quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x, y)).

Which of the functions in this command can help them determine how strongly related their variables are?

  • sd(x)
  • cor(x,y) (Correct)
  • sd(y)
  • mean(y)

Correct: The cor() function returns the correlation between two variables. This determines how strong the relationship between those two variables is.

3. Fill in the blank: The bias function compares the actual outcome of the data with the _____ outcome to determine whether or not the model is biased.

  • desired
  • probable
  • final
  • predicted (Correct)

Correct: The bias function compares the actual outcome of the data with the predicted outcome to determine whether or not the model is biased.

GOOGLE DATA ANALYTICS COURSERA ANSWERS AND STUDY GUIDE

Liking our content? Then don’t forget to add us to your bookmarks so you can find us easily!

Weekly Breakdown | Google Study Guides | Back to Top

Data Analysis with R Programming Weekly Challenge 3

1. A data analyst creates a data frame with data that has more than 50,000 observations in it. When they print their data frame, it slows down their console. To avoid this, they decide to switch to a tibble. Why would a tibble be more useful in this situation?

  • Tibbles only include a limited number of data items
  • Tibbles will automatically create row names to make the data easier to read
  • Tibbles will automatically change the names of variables to make them shorter and easier to read
  • Tibbles won’t overload the console because they automatically only print the first 10 rows of data and as many variables as will fit on the screen (Correct)

Correct: Tibbles make printing in R easier. They won’t accidentally overload the data analyst’s console because they’re automatically set to pull up only the first 10 rows and as many columns as fit on screen.

2. A data analyst is exploring their data to get more familiar with it. They want a preview of just the first six rows to get a better idea of how the data frame is laid out. What function should they use?

  • print()
  • colnames()
  • preview()
  • head() (Correct)

Correct: The head() function can be used to return a preview of the first six rows of a data frame. This is a useful way to explore a data frame and get more familiar with how it is structured.

3. You are working with the ToothGrowth dataset. You want to use the head() function to get a preview of the dataset. Write the code chunk that will give you this preview.

Course_7_Weekly_Challenge_3.1
  • head(ToothGrowth)

What are the names of the columns in the ToothGrowth dataset?

  • len, supp, dose (Correct)
  • VC, supp, dose
  • len, supp, VC
  • len, VC, dose

Correct: The code chunk head(ToothGrowth) gives you a preview of the dataset. Inside the parentheses of the head() function is the name of the dataset you want to preview. The code returns a view of the column names and the first few rows of the dataset. The names of the columns in the ToothGrowth dataset are len, supp, dose.

4. A data analyst is working with a data frame named cars. The analyst notices that all the column names in the data frame are capitalized. What code chunk lets the analyst change all the column names to lowercase?

  • rename_with(cars, toupper)
  • rename_with(tolower, cars)
  • rename_with(toupper, cars)
  • rename_with(cars, tolower) (Correct)

Correct: The code chunk is rename_with(cars, tolower). The rename_with() function will enable the analyst to easily change the case of the column names to lowercase. Including the tolower argument indicates that all column names will be changed to lowercase.

5. A data analyst is working with the penguins dataset in R. What code chunk will allow them to sort the penguins data by the variable bill_length_mm?

  • arrange(=bill_length_mm)
  • arrange(bill_length_mm, penguins)
  • arrange(penguins, bill_length_mm) (Correct)
  • arrange(penguins)

Correct: The code chunk is arrange(penguins, bill_length_mm). The arrange function allows the analyst to sort data in their dataset. The arguments for the function identify the dataset as the penguins data, and that the sort should be based on the bill_length_mm variable. The data is automatically sorted in ascending order.

6. You are working with the penguins dataset. You want to use the summarize() and max() functions to find the maximum value for the variable flipper_length_mm. You write the following code:

  • penguins %>%
  •   drop_na() %>%
  •   group_by(species) %>%

Add the code chunk that lets you find the maximum value for the variable flipper_length_mm.

Course_7_Weekly_Challenge_3

What is the maximum flipper length in mm for the Gentoo species?

  • 212
  • 231 (Correct)
  • 210
  • 200

Correct: The code chunk summarize(max(flipper_length_mm)) lets you find the maximum value for the variable flipper_length_mm. The correct code is penguins %>% drop_na() %>% group_by(species) %>% summarize(max(flipper_length_mm)). The summarize() function displays summary statistics. You can use the summarize() function in combination with other functions  — such as mean(), max(), and min() — to calculate specific statistics. In this case, you use max() to calculate the maximum value for flipper length. The maximum flipper length for the Gentoo species is 231mm.

7. A data analyst is working with a data frame called salary_data. They want to create a new column named total_wages that adds together data in the standard_wages and overtime_wages columns. What code chunk lets the analyst create the total_wages column?

  • mutate(total_wages = standard_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages + overtime_wages) (Correct)
  • mutate(salary_data, standard_wages = total_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages * overtime_wages)

Correct: The code chunk is mutate(salary_data, total_wages = standard_wages + overtime_wages). The analyst can use the mutate() function to create a new column for standard_wages plus overtime_wages called total_wages. The mutate() function can create a new column without affecting any existing columns.

8. A data analyst is working with a data frame named stores. It has separate columns for city (city) and state (state). The analyst wants to combine the two columns into a single column named location, with the city and state separated by a comma. What code chunk lets the analyst create the location column?

  • unite(stores, “location”, city, state, sep=”,”) (Correct)
  • unite(stores, “location”, city, state)
  • unite(stores, “location”, city, sep=”,”)
  • unite(stores, city, state, sep=”,”)

Correct: The code chunk  unite(stores, “location”, city, state, sep=”,”) lets the analyst create the location column. The unite() function lets the analyst combine the city and state data into a single column. In the parentheses of the function, the analyst writes the name of the data frame, then the name of the new column in quotation marks, followed by the names of the two columns they want to combine. Finally, the argument sep=”,” places a comma between the city and state data in the location column.

9. In R, which statistical measure demonstrates how strong the relationship is between two variables?

  • Standard deviation
  • Average
  • Maximum
  • Correlation (Correct)

Correct: Correlation measures how strong the relationship between two variables is. This is represented by the cor() function.

10. A data analyst is studying weather data. They write the following code chunk:

  • bias(actual_temp, predicted_temp)

What will this code chunk calculate?

  • The maximum difference between the actual and predicted values
  • The minimum difference between the actual and predicted values
  • The average difference between the actual and predicted values (Correct)
  • The total average of the values

Correct: The bias() function can be used to calculate the average amount a predicted outcome and actual outcome differ in order to determine if the data model is biased.

11. A data analyst is working with a data frame called salary_data. They want to create a new column named hourly_salary that includes data from the wages column divided by 40. What code chunk lets the analyst create the hourly_salary column?

  • mutate(hourly_salary, salary_data = wages / 40)
  • mutate(salary_data, hourly_salary = wages * 40)
  • mutate(salary_data, hourly_salary = wages / 40) (Correct)
  • mutate(hourly_salary = wages / 40)

Correct: The code chunk is mutate(salary_data, hourly_salary = wages / 40) . The analyst can use the mutate() function to create a new column for wages divided by 40 called hourly_salary. The mutate() function can create a new column without affecting any existing columns.

12. A data analyst wants a high level summary of the structure of their data frame, including the column names, the number of rows and variables, and type of data within a given column. What function should they use?

  • colnames()
  • rename_with()
  • str() (CORRECT)
  • head()

13. You are working with the ToothGrowth dataset. You want to use the select() function to view all columns except the supp column. Write the code chunk that will give you this view.

1

How many columns does the resulting data frame contain?

  • 1
  • 3
  • 2 (CORRECT)
  • 4

14. You are working with the penguins dataset. You want to use the summarize() and min() functions to find the minimum value for the variable bill_depth_mm. At this point, the following code has already been written into the script:

penguins %>%

 drop_na() %>%

 group_by(species) %>%

Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.

(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)

1

 What is the minimum bill depth in mm for the Chinstrap species?

  • 12.4
  • 13.1
  • 15.5
  • 16.4 (CORRECT)

Correct: The code chunk summarize(min(bill_depth_mm)) lets you find the minimum value for the variable bill_depth_mm. The correct code is penguins %>% drop_na() %>% group_by(species) %>% summarize(min(bill_depth_mm)). The summarize() function displays summary statistics. You can use the summarize() function in combination with other functions — such as mean(), max(), and min() — to calculate specific statistics. In this case, you use min() to calculate the minimum value for bill depth. The minimum bill depth for the Chinstrap species is 16.4mm.

15. A data analyst wants to find out how much the predicted outcome and the actual outcome of their data model differ. What function can they use to quickly measure this?

  • mean()
  • bias() (CORRECT)
  • sd()
  • cor()

16. You are working with the ToothGrowth dataset. You want to use the skim_without_charts() function to get a comprehensive view of the dataset. Write the code chunk that will give you this view. 

1

 What is the average value of the len column?

  • 13.1
  • 18.8 (CORRECT)
  • 4.2
  • 7.65

17. A data analyst is working with the penguins dataset and wants to sort the penguins by body_mass_g from least to greatest. When they run the following code the penguin body mass data is not displayed in the correct order.

penguins %>% arrange(body_mass_g)

head(penguins)

What can the data analyst do to fix their code?

  • Use the print() function instead of the head() function
  • Correct the capitalization of arrange() to Arrange()
  • Save the results of arrange() to a variable that gets passed to head() (CORRECT)
  • Add a minus sign in front of body_mass_g to reverse the order

18. You are working with the penguins dataset and want to understand the year of data collection for all combinations of species, island, and sex. At this point, the following code has already been written into your script:

penguins %>%

 drop_na() %>%

 group_by(species, island, sex) %>%

 summarize(min = min(year), max = max(year))

1

When you run the code in the code box, how many separate observational rows are returned by this code chunk?

  • 10
  • 6
  • 2
  • 3 (CORRECT)

19. A data analyst is working with a data frame called athletes. The data frame contains a column names record that represents an athlete’s wins and losses separated by a hyphen (-). They want to turn this single column into individual columns for wins and losses. Which code chunk lets the analyst split the record column?

  • separate(athletes, record, into=c(“wins”, “losses”), delim=”-“)
  • separate(athletes, record, into=c(“wins”, “losses”), sep=”-“) (CORRECT)
  • separate(record, athletes, into=c(“wins”, “losses”), sep=”-“)*
  • separate(record, athletes, into=c(“wins”, “losses”), delim=”-“)

20. A data analyst is working with a data frame named stores. It has separate columns for city (city) and state (state). The analyst wants to combine the two columns into a single column named location, with the city and state separated by a comma. What code chunk lets the analyst create the location column?

  • unite(stores, “location”, city, state, sep=”,”)(CORRECT)
  • unite(stores, “location”, city, state)
  • unite(stores, “location”, city, sep=”,”)
  • unite(stores, city, state, sep=”,”)

21.  A data analyst is working with the penguins dataset in R. What code chunk will allow them to sort the penguins data by the variable bill_length_mm?

  • arrange(=bill_length_mm)
  • arrange(penguins, bill_length_mm)(CORRECT)
  • arrange(bill_length_mm, penguins)
  • arrange(penguins)

22. A data analyst is working with a data frame called salary_data. They want to create a new column named total_wages that adds together data in the standard_wages and overtime_wagescolumns. 

  • mutate(salary_data, total_wages = standard_wages + overtime_wages) (CORRECT)
  • mutate(total_wages = standard_wages + overtime_wages)
  • mutate(salary_data, standard_wages = total_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages * overtime_wages)

23. What scenarios would prevent you from being able to use a tibble?

  • You need to store numerical data
  • You need to create column names
  • You need to create row names (CORRECT)
  • You need to change the data types of inputs (CORRECT)

24. You are working with the ToothGrowth dataset. You want to use the skim_without_charts() function to get a comprehensive view of the dataset. Write the code chunk that will give you this view.

1

 How many rows does the ToothGrowth dataset contain?

  • 50
  • 40
  • 60 (CORRECT)
  • 25

Correct: The code chunk skim_without_charts(ToothGrowth) gives you a comprehensive view of the dataset. Inside the parentheses of the skim_without_charts() function is the name of the dataset you want to view. The code returns a summary with the name of the dataset and the number of rows and columns. It also shows the column types and data types contained in the dataset. The ToothGrowth dataset contains 60 rows.

25. In R, which statistical measure demonstrates how strong the relationship is between two variables?

  • Standard deviation
  • Average
  • Correlation (CORRECT)
  • Maximum

26. A data analyst is studying weather data. They write the following code chunk:

bias(actual_temp, predicted_temp)

What will this code chunk calculate?

  • The minimum difference between the actual and predicted values
  • The average difference between the actual and predicted values (CORRECT)
  • The maximum difference between the actual and predicted values
  • The total average of the values

27. A data analyst wants to learn more about a specific data frame. Which function will allow them to review the data types of each column in the data frame?

  • colnames()
  • package()
  • library()
  • str() (CORRECT)

28. You are working with the ToothGrowth dataset. You want to use the glimpse() function to get a quick summary of the dataset. Write the code chunk that will give you this summary. 

1

2

 How many different data types are used for the column data types?

  • 2 (CORRECT)
  • 3
  • 60
  • 1

29. You are working with the penguins dataset. You want to use the summarize() and mean() functions to find the mean value for the variable body_mass_g. At this point, the following code has already been written into your script:

penguins %>%

drop_na() %>%

group_by(species) %>%

Add the code chunk that lets you find the mean value for the variable body_mass_g.

(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)

1

What is the mean body mass in g for the Adelie species?

  • 3733.088
  • 3706.164 (CORRECT)
  • 5092.437
  • 4207.433

Correct: The code chunk summarize(mean(body_mass_g)) lets you find the mean value for the variable body_mass_g. The correct code is penguins %>% drop_na() %>% group_by(species) %>% summarize(mean(body_mass_g)). The summarize() function displays summary statistics. You can use the summarize() function in combination with other functions — such as mean(), max(), and min() — to calculate specific statistics. In this case, you use mean() to calculate the mean value for body mass. The mean body mass for the Adelie species is 3706.164g.

30. A data analyst is working with a data frame called sales. In the data frame, a column named location represents data in the format “city, state”. The analyst wants to split the city into an individual city column and state into a new countrycolumn. What code chunk lets the analyst split the location column?

  • separate(sales, location, into=c(“country”, “city” ), sep=”, “)
  • separate(sales, location, into=c(“city”, “country”), sep=”, “) (CORRECT)
  • separate(sales, location, into=c(“country”, “city” ), sep=” “)
  • untie(sales, location, into=c(“city”, “country”), sep=”, “)

31. What is an advantage of using data frames instead of tibbles?

  • Data frames make printing easier
  • Data frames allow you to create row names (CORRECT)
  • Data frames allow you to use column names
  • Data frames store never change variable names

32. A data analyst is checking a script for one of their peers. They want to learn more about a specific data frame. What function(s) will allow them to see a subset of data values in the data frame? Select all that apply.

  • library()
  • colnames()
  • head() (CORRECT)
  • str() (CORRECT)

Working with Data in R CONCLUSION

The R programming language is a powerful tool for data analysis. In this part of the course, you’ve learned about how R can help you structure, organize, and clean your data using functions and other processes.

You’ve also seen how data frames work in R and how to deal with data bias. These are all important skills for anyone working with data. If you want to learn more about R and continue your journey of becoming a data analyst, join us in Coursera today.