Data analyst is working with a data frame

Course 7 – Data Analysis with R Programming Quiz Answers

Week 3: Working with Data in R

GOOGLE DATA ANALYTICS PROFESSIONAL CERTIFICATE

Coursera RStudio Answers Study Guide

Working with Data in R INTRODUCTION

Working with data in R is an important step of the data analysis process. In this part of the Google Data Analytics Professional Certification course from Coursera, you’ll learn about how to use functions and other processes for structuring, organizing and cleaning your data using R. You will explore the concept of a Data Frame – an object that stores tabular data – and learn how to work with them in R.

Additionally, you will consider potential sources of bias in datasets, such as selection bias or confounding variables, and understand how R can help detect these issues. Working through this course module gives you the skills needed to effectively manage your data before further analyzing it in more depth with other tools and techniques.

Learning Objectives

  • Discuss how R functions may be used to address issues of bias and relationship between data variables
  • Describe R functions that may be used to clean and organize data
  • Describe functions used to work with data frames including read_csv(), data(), and datapasta()
  • Discuss the difference between tibbles and tribbles
  • Compare and contrast data cleaning with different tools
  • Create and work with data in R

Test your knowledge on r data frames

1. Which of the following are best practices for creating data frames? Select all that apply.

  • All data stored should be the same type
  • Rows should be named
  • Columns should be named (Correct)
  • Each column should contain the same number of data items (Correct)

Correct: When creating data frames, columns should be named and each column should contain the same number of data items.

2. Why are tibbles a useful variation of data frames?

  • Tibbles make printing easier (Correct)
  • Tibbles make changing the names of variables easier.
  • Tibble can change the data type of inputs
  • Tibbles can create row names

Correct: Tibbles can make printing easier. They also help you avoid overloading your console when working with large datasets. Tibbles are automatically set to only return the first ten rows of a dataset and as many columns as it can fit on the screen.

3. Tidy data is a way of standardizing the organization of data within R.

  • True (Correct)
  • False

Correct: Tidy data refers to the principles that make data structures meaningful and easy to understand. It’s a way of standardizing the organization of data within R.

4. Which R function can be used to make changes to a data frame?

  • str()
  • mutate() (Correct)
  • head()
  • colnames()

Correct: The mutate() function can be used to make changes to a data frame.

Test your knowledge on cleaning data

1. A data analyst is cleaning their data in R. They want to be sure that their column names are unique and consistent to avoid any errors in their analysis. What R function can they use to do this automatically?

  • rename()
  • clean_names() (Correct)
  • select()
  • rename_with()

Correct: The clean_names() function will automatically make sure that column names are unique and consistent.

2. You are working with the penguins dataset. You want to use the arrange() function to sort the data for the column bill_length_mm in ascending order. You write the following code:

penguins %>%

Add a code chunk to sort the column bill_length_mm in ascending order.

Course_7_Week_3
  • 33.1
  • 33.5
  • 34.0
  • 32.1 (Correct)

Correct: You add the code chunk arrange(bill_length_mm) to sort the column bill_length_mm in ascending order. The correct code is penguins %>% arrange(bill_length_mm). Inside the parentheses of the arrange() function is the name of the variable you want to sort. The code returns a tibble that displays the data for bill_length_mm from shortest to longest. The shortest bill length is 32.1mm.

3. A data analyst is working with customer information from their company’s sales data. The first and last names are in separate columns, but they want to create one column with both names instead. Which of the following functions can they use?

  • arrange()
  • unite() (Correct)
  • select()
  • separate()

Correct: The unite() function can be used to combine columns.

test your knowledge on R functions

1. Which of the following functions can a data analyst use to get a statistical summary of their dataset? Select all that apply.

  • mean() (Correct)
  • ggplot2()
  • cor() (Correct)
  • sd() (Correct)

Correct: The sd(), cor(), and mean() functions can provide a statistical summary of the dataset using standard deviation, correlation, and mean.

2. A data analyst inputs the following command:

quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x, y)).

Which of the functions in this command can help them determine how strongly related their variables are?

  • sd(x)
  • cor(x,y) (Correct)
  • sd(y)
  • mean(y)

Correct: The cor() function returns the correlation between two variables. This determines how strong the relationship between those two variables is.

3. Fill in the blank: The bias function compares the actual outcome of the data with the _____ outcome to determine whether or not the model is biased.

  • desired
  • probable
  • final
  • predicted (Correct)

Correct: The bias function compares the actual outcome of the data with the predicted outcome to determine whether or not the model is biased.

GOOGLE DATA ANALYTICS COURSERA ANSWERS AND STUDY GUIDE

Liking our content? Then don’t forget to add us to your bookmarks so you can find us easily!

Weekly Breakdown | Google Study Guides | Back to Top

Data Analysis with R Programming Weekly Challenge 3

1. A data analyst creates a data frame with data that has more than 50,000 observations in it. When they print their data frame, it slows down their console. To avoid this, they decide to switch to a tibble. Why would a tibble be more useful in this situation?

  • Tibbles only include a limited number of data items
  • Tibbles will automatically create row names to make the data easier to read
  • Tibbles will automatically change the names of variables to make them shorter and easier to read
  • Tibbles won’t overload the console because they automatically only print the first 10 rows of data and as many variables as will fit on the screen (Correct)

Correct: Tibbles make printing in R easier. They won’t accidentally overload the data analyst’s console because they’re automatically set to pull up only the first 10 rows and as many columns as fit on screen.

2. A data analyst is exploring their data to get more familiar with it. They want a preview of just the first six rows to get a better idea of how the data frame is laid out. What function should they use?

  • print()
  • colnames()
  • preview()
  • head() (Correct)

Correct: The head() function can be used to return a preview of the first six rows of a data frame. This is a useful way to explore a data frame and get more familiar with how it is structured.

3. You are working with the ToothGrowth dataset. You want to use the head() function to get a preview of the dataset. Write the code chunk that will give you this preview.

Course_7_Weekly_Challenge_3.1
  • head(ToothGrowth)

What are the names of the columns in the ToothGrowth dataset?

  • len, supp, dose (Correct)
  • VC, supp, dose
  • len, supp, VC
  • len, VC, dose

Correct: The code chunk head(ToothGrowth) gives you a preview of the dataset. Inside the parentheses of the head() function is the name of the dataset you want to preview. The code returns a view of the column names and the first few rows of the dataset. The names of the columns in the ToothGrowth dataset are len, supp, dose.

4. A data analyst is working with a data frame named cars. The analyst notices that all the column names in the data frame are capitalized. What code chunk lets the analyst change all the column names to lowercase?

  • rename_with(cars, toupper)
  • rename_with(tolower, cars)
  • rename_with(toupper, cars)
  • rename_with(cars, tolower) (Correct)

Correct: The code chunk is rename_with(cars, tolower). The rename_with() function will enable the analyst to easily change the case of the column names to lowercase. Including the tolower argument indicates that all column names will be changed to lowercase.

5. A data analyst is working with the penguins dataset in R. What code chunk will allow them to sort the penguins data by the variable bill_length_mm?

  • arrange(=bill_length_mm)
  • arrange(bill_length_mm, penguins)
  • arrange(penguins, bill_length_mm) (Correct)
  • arrange(penguins)

Correct: The code chunk is arrange(penguins, bill_length_mm). The arrange function allows the analyst to sort data in their dataset. The arguments for the function identify the dataset as the penguins data, and that the sort should be based on the bill_length_mm variable. The data is automatically sorted in ascending order.

6. You are working with the penguins dataset. You want to use the summarize() and max() functions to find the maximum value for the variable flipper_length_mm. You write the following code:

  • penguins %>%
  •   drop_na() %>%
  •   group_by(species) %>%

Add the code chunk that lets you find the maximum value for the variable flipper_length_mm.

Course_7_Weekly_Challenge_3

What is the maximum flipper length in mm for the Gentoo species?

  • 212
  • 231 (Correct)
  • 210
  • 200

Correct: The code chunk summarize(max(flipper_length_mm)) lets you find the maximum value for the variable flipper_length_mm. The correct code is penguins %>% drop_na() %>% group_by(species) %>% summarize(max(flipper_length_mm)). The summarize() function displays summary statistics. You can use the summarize() function in combination with other functions  — such as mean(), max(), and min() — to calculate specific statistics. In this case, you use max() to calculate the maximum value for flipper length. The maximum flipper length for the Gentoo species is 231mm.

7. A data analyst is working with a data frame called salary_data. They want to create a new column named total_wages that adds together data in the standard_wages and overtime_wages columns. What code chunk lets the analyst create the total_wages column?

  • mutate(total_wages = standard_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages + overtime_wages) (Correct)
  • mutate(salary_data, standard_wages = total_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages * overtime_wages)

Correct: The code chunk is mutate(salary_data, total_wages = standard_wages + overtime_wages). The analyst can use the mutate() function to create a new column for standard_wages plus overtime_wages called total_wages. The mutate() function can create a new column without affecting any existing columns.

8. A data analyst is working with a data frame named stores. It has separate columns for city (city) and state (state). The analyst wants to combine the two columns into a single column named location, with the city and state separated by a comma. What code chunk lets the analyst create the location column?

  • unite(stores, “location”, city, state, sep=”,”) (Correct)
  • unite(stores, “location”, city, state)
  • unite(stores, “location”, city, sep=”,”)
  • unite(stores, city, state, sep=”,”)

Correct: The code chunk  unite(stores, “location”, city, state, sep=”,”) lets the analyst create the location column. The unite() function lets the analyst combine the city and state data into a single column. In the parentheses of the function, the analyst writes the name of the data frame, then the name of the new column in quotation marks, followed by the names of the two columns they want to combine. Finally, the argument sep=”,” places a comma between the city and state data in the location column.

9. In R, which statistical measure demonstrates how strong the relationship is between two variables?

  • Standard deviation
  • Average
  • Maximum
  • Correlation (Correct)

Correct: Correlation measures how strong the relationship between two variables is. This is represented by the cor() function.

10. A data analyst is studying weather data. They write the following code chunk:

  • bias(actual_temp, predicted_temp)

What will this code chunk calculate?

  • The maximum difference between the actual and predicted values
  • The minimum difference between the actual and predicted values
  • The average difference between the actual and predicted values (Correct)
  • The total average of the values

Correct: The bias() function can be used to calculate the average amount a predicted outcome and actual outcome differ in order to determine if the data model is biased.

11. A data analyst is working with a data frame called salary_data. They want to create a new column named hourly_salary that includes data from the wages column divided by 40. What code chunk lets the analyst create the hourly_salary column?

  • mutate(hourly_salary, salary_data = wages / 40)
  • mutate(salary_data, hourly_salary = wages * 40)
  • mutate(salary_data, hourly_salary = wages / 40) (Correct)
  • mutate(hourly_salary = wages / 40)

Correct: The code chunk is mutate(salary_data, hourly_salary = wages / 40) . The analyst can use the mutate() function to create a new column for wages divided by 40 called hourly_salary. The mutate() function can create a new column without affecting any existing columns.

Working with Data in R CONCLUSION

The R programming language is a powerful tool for data analysis. In this part of the course, you’ve learned about how R can help you structure, organize, and clean your data using functions and other processes.

You’ve also seen how data frames work in R and how to deal with data bias. These are all important skills for anyone working with data. If you want to learn more about R and continue your journey of becoming a data analyst, join us in Coursera today.

Subscribe to our site

Get new content delivered directly to your inbox.

Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!