Documentation and Reports

Course 7 – Data Analysis with R Programming Quiz Answers

Week 5: Documentation and Report

GOOGLE DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

DOCUMENTATION AND REPORT INTRODUCTION

This part of Coursera’s Google Data Analytics Professional Certificate: Programming with R focuses on using R Markdown to create interactive documents. With this report-generating tool, you can generate reports that are dynamic and tailored to your data analysis needs. You will learn how to format and export your data using this method and incorporate R code chunks into your documents. Additionally, you’ll find out how to include tables, images, and other elements in a report to make it even more interactive. With the help of this course and its instruction on R Markdown, you will be able to present your analysis reports for maximum impact effectively.

Learning Objectives

  • Demonstrate an understanding of how to export R Markdown notebooks
  • Incorporate R code chunks into R Markdown notebooks
  • Use basic formatting in R Markdown to create structure and emphasize content
  • Describe the R Markdown notebooks and their use to document R programming code
  • Create and outline a structure for an R Markdown notebook
  • Access and use a customized R Markdown template included in an R package
  • Demonstrate an understanding of the uses of R Markdown templates

Test your knowledge about documentation and report

1. Fill in the blank: Markdown is a _____ for formatting plain text files.

  • syntax (Correct)
  • coding language
  • guide
  • file application

Correct: Markdown is a syntax for formatting plain text files.

2. A data analyst creates an interactive version of their R Markdown document to share with other users that allows them to execute code the analyst wrote. What did they create?

  • An HTML report
  • A code chunk
  • An R notebook (Correct)
  • A markdown

Correct: They created an R notebook, which is an interactive R Markdown option. It lets users run code from the R Markdown document and displays charts and graphs to visualize that code.

3. A data analyst wants to convert their R Markdown file into another format. What are their options? Select all that apply.

  • Slide presentation (Correct)
  • JPEG, PNG, and GIF
  • Dashboard (Correct)
  • HTML, PDF, and Word (Correct)

Correct: R Markdown files can be converted into HTML, PDF and Word, slideshow presentations, or dashboards.

4. A data analyst has finished editing their R Markdown file and wants to save it as an HTML report. What tool will they use?

  • Output
  • Hashtags
  • Knit (Correct)
  • Save

Correct: The knit button will produce a report containing all text, code, and results from the R Markdown file.

5. R Markdown allows you to create a record of the steps you took to complete your analysis directly in RStudio.

  • True (Correct)
  • False

6. Where in RStudio can you find the export menu for saving plots?

  • The R console pane
  • The plots tab (Correct)
  • The source editor pane
  • The environment pane

Correct: In RStudio, you can find the export menu for saving plots in the plots tab.

Test your knowledge on code chunks

1. Fill in the blank: A delimiter is a character that marks the beginning and end of _____.

  • a data item (Correct)
  • an .rmd file
  • an HTML report
  • a command line

Correct: A delimiter is a character that marks the beginning and end of a data item. It can mark a single line of code, or a whole section of code in an .rmd file.

2. A data analyst has to create a monthly report for their stakeholders. What can they create to help them save time generating these reports?

  • .rmd file
  • Template (Correct)
  • HTML report
  • R notebook

Correct: Creating a template for your reports allows you to run one line of code to update your data without having to recreate the report from scratch. Templates can also help you customize the appearance of your final report.

3. A data analyst wants to mark the beginning of their code chunk. What delimiter should they type in their .rmd file?

  • “`{r } (Correct)
  • ***{r }
  • ==={r }
  •  +++{r }

Correct: Three backticks followed by the letter r in braces (“`{r }) indicates the beginning of a code chunk in an .rmd file.

4. A data analyst is inserting a line of code directly into their .rmd file. What will they use to mark the beginning and end of the code?

  • Delimiters (Correct)
  • Hashtags
  • Asterisks
  • Markdown

Correct: A delimiter is a character that indicates the beginning or end of a data

5. data analyst notices that their header is much smaller than they wanted it to be. What happened?

  • They have too many asterisks
  • They have too few hashtags
  • They have too many hashtags (Correct)
  • They have too few asterisks

Correct: Hashtags can be used to change the font size of headers. The more hashtags you add, the smaller the header.

Data Analysis with R Programming Weekly Challenge 5

1. R Markdown is a file format for making dynamic documents with R. What are the benefits of creating this kind of document? Select all that apply.

  • Generate a report with executable code chunks (Correct)
  • Create a record of your cleaning process (Correct)
  • Save, organize, and document code (Correct)
  • Perform calculations for analysis more efficiently

Correct: R Markdown documents can be used to save, organize, and document code; create a record of your cleaning process; and generate reports with executable code for stakeholders.

2. Fill in the blank: R Markdown notebooks can be converted into HTML, PDF, and Word documents, slide presentations, and _____.

  • dashboards (Correct)
  • spreadsheets
  • tables
  • YAML

Correct: R Markdown notebooks can be converted into HTML, PDF, and Word documents, slide presentations, and dashboards.

3. A data analyst writes two hashtags next to their header. What will this do to the header font in the .rmd file?

  • Make it smaller (Correct)
  • Make it a different color
  • Make it centered
  • Make it bigger

Correct: Hashtags can be used to change the font size of headers. The more hashtags you add, the smaller the header.

4. A data analyst wants to include a line of code directly in their .rmd file in order to explain their process more clearly. What is this code called?

  • Inline code (Correct)
  • YAML
  • Documented
  • Markdown

Correct: Inline code is code that can be inserted directly into a .rmd file.

5. A data analyst wants to add a bulleted list to their R Markdown document. What symbol can they type to create this formatting?

  • Brackets
  • Hashtags
  • Delimiters
  • Asterisks (Correct)

Correct: Asterisks when inserted before a word or phrase in R Markdown will appear as a bulleted list.

6. Fill in the blank: Code added to an .rmd file is usually referred to as a code _____. This allows users to execute R code from within the .rmd file.

  • filter
  • section
  • chunk (Correct)
  • file

Correct: Code added to an .rmd file is usually referred to as a code chunk. Code chunks allow users to execute R code from within the .rmd file.

7. A data analyst adds specific characters before and after their code chunk to mark where the data item begins and ends in the .rmd file. What are these characters called?

  • Delimiters (Correct)
  • Syntax
  • Markdown
  • Backticks

Correct: A delimiter is a character that indicates the beginning or end of a data item in a code chunk. 

8. If an analyst creates the same kind of document over and over or customizes the appearance of a final report, they can use _____ to save them time.

  • an .rmd file
  • a filter
  • a code chunk
  • a template (Correct)

Correct: A template can save time when creating the same kind of document over and over or when customizing the appearance of a final report.

GOOGLE DATA ANALYTICS COURSERA ANSWERS AND STUDY GUIDE

Liking our content? Then don’t forget to add us to your bookmarks so you can find us easily!

Weekly Breakdown | Google Study Guides | Back to Top

Data Analysis with R Programming Course Challenge

1. Scenario 1, questions 1-7

As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.

Your current client is Chocolate and Tea, an up-and-coming chain of cafes.

Chocoloate_and_Tea

The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.

Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.

They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.

Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.

Your supervisor asks you to write a short summary of the benefits of using R for the project. Which of the following benefits would you include in your summary? Select all that apply.

  • Quickly process lots of data (Correct)
  • Easily reproduce and share the analysis (Correct)
  • Define a problem and ask the right questions
  • Create high-quality data visualizations (Correct)

Correct: The benefits of using R for the project include the ability to quickly process lots of data and create high-quality data visualizations. You can also easily reproduce and share your analysis.

2. Scenario 1, continued

Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.

You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is bars_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

  • bars_df + read_csv(“flavors_of_cacao.csv”)
  • read_csv(“flavors_of_cacao.csv”) + bars_df
  • bars_df %>% read_csv(“flavors_of_cacao.csv”)
  • bars_df <- read_csv(“flavors_of_cacao.csv”) (Correct)

Correct:

3. Scenario 1, continued

Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.

Assume the name of your data frame is flavors_df. What code chunk lets you review the column names in the data frame?

  • rename(flavors_df)
  • colnames(flavors_df) (Correct)
  • col(flavors_df)
  • arrange(flavors_df)

Correct: You write the code chunk colnames(flavors_df). In this code chunk:

colnames() is the function that will let you review the column names in the data frame.

flavors_df is the name of the data frame that the colnames() function takes for its argument.

4. Scenario 1, continued

Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company…Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Maker (without a period at the end).

Assume the first part of your code chunk is:

flavors_df %>%

What code chunk do you add to change the column name?

  • rename(Maker = Company…Maker.if.known.)
  • rename(Maker %<% Company…Maker.if.known.)
  • rename(Company…Maker.if.known. = Maker) (Correct)
  •  rename(Company…Maker.if.known %<% Maker)

Correct:

5. After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Bean.Type. You decide to use the select() function to create a new data frame with only these three variables.

Assume the first part of your code is:

trimmed_flavors_df <- flavors_df %>%

Add the code chunk that lets you select the three variables.

Course_7_Course_Challenge1.3

What bean type appears in row 6 of your tibble? 

  • Criollo (Correct)
  • Beniano
  • Forastero
  • Trinitario

Correct: You add the code chunk select(Rating, Cocoa.Percent, Bean.Type) to select the three variables. The correct code is trimmed_flavors_df <- flavors_df %>% select(Rating, Cocoa.Percent, Bean.Type). In this code chunk:

The select() function lets you select specific variables for your new data frame.

select() takes the names of the variables you want to choose as its argument: Rating, Cocoa.Percent, Bean.Type.

The bean type Criollo appears in row 6 of your tibble.

6. Next, you select the basic statistics that can help your team better understand the ratings system in your data.

Assume the first part of your code is:

trimmed_flavors_df %>%

You want to use the summarize() and max() functions to find the maximum rating for your data. Add the code chunk that lets you find the maximum value for the variable Rating.

Course_7_Course_Challenge1.4

What is the maximum rating?

  • 4.5
  • 5.5
  • 6
  • 5 (Correct)

Correct: You add the code chunk summarize(max(Rating)) to find the maximum value for the variable Rating. The correct code is trimmed_flavors_df %>% summarize(max(Rating)). In this code chunk:

The summarize() function lets you display summary statistics. You can use the summarize() function in combination with other functions such as mean(), max(), and min() to calculate specific statistics.

In this case, you use max() to calculate the maximum value for the variable Rating.

The maximum rating is 5.

Liking our content? Then, don’t forget to ad us to your bookmarks so you can find us easily!

7. After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.9 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar’s cocoa percent is greater than or equal to 75%. You decide to create a new data frame to find out which chocolate bars meet these two conditions.

Assume the first part of your code is:

best_trimmed_flavors_df <- trimmed_flavors_df %>%

You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the data frame for chocolate bars that contain at least 75% cocoa and have a rating of at least 3.9 points.

filter(Cocoa.Percent >= 75, Rating >= 3.9)

Course_7_Course_Challenge1.5

What value for cocoa percent appears in row 1 of your tibble?

  • 80%
  • 75% (Correct)
  • 88%
  • 78%

Correct: The code chunk filter(Cocoa.Percent >= 75, Rating >= 3.9) lets you filter the data frame for chocolate bars that contain at least 75% cocoa and have a rating of at least 3.9 points. The correct code is best_trimmed_flavors_df <- trimmed_flavors_df %>% filter(Cocoa.Percent >= 75, Rating >= 3.9). In this code chunk:

The filter() function lets you filter your data frame based on specific criteria.

Cocoa.Percent and Rating refer to the variables you want to filter.

The >= operator signifies “greater than or equal to.”

The new data frame will show all the values of Cocoa.Percent greater than or equal to 75, and all the values of Rating greater than or equal to 3.9. 

The value 75% for cocoa percent appears in row 1 of your tibble.

8. Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals.

Assume your first line of code is:

ggplot(data = best_trimmed_flavors_df) +

You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Company.Location on the x-axis.

Course_7_Course_Challenge1.2

How many bars does your bar chart display?

  • 4
  • 6
  • 5 (Correct)
  • 3

Correct: You add the code chunk geom_bar(mapping = aes(x = Company.Location)) to create a bar chart with the variable Company.Location on the x-axis. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Company.Location)). In this code chunk:

geom_bar() is the geom function that uses bars to create a bar chart.

Inside the parentheses of the aes() function, the code x = Company.Location maps the x aesthetic to the variable Company.Location.

Company.Location will appear on the x-axis of the plot.

By default, R will put a count of the variable Company.Location on the y-axis.

Your bar chart displays 5 bars.

9. Your bar chart reveals the locations that produce the highest rated chocolate bars. To get a better idea of the specific rating for each location, you’d like to highlight each bar.

Assume that you are working with the following code:

ggplot(data = best_trimmed_flavors_df) +

  geom_bar(mapping = aes(x = Company.Location))

Add a code chunk to the second line of code to map the aesthetic fill to the variable Rating.

NOTE: the three dots (…) indicate where to add the code chunk.

Course_7_Course_Challenge1.6

According to your bar chart, which two company locations produce the highest rated chocolate bars?

  • Canada and France (Correct)
  • Scotland and Canada
  • Scotland and U.S.A.
  • Amsterdam and France

Correct: You add the code chunk fill = Rating to the second line of code to map the aesthetic fill to the variable Rating. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Company.Location, fill = Rating)). In this code chunk:

Inside the parentheses of the aes() function, after the comma that follows x = Company.Location, write the aesthetic (fill), then an equals sign, then the variable (Rating).

The specific rating of each location will appear as a specific color inside each bar of your bar chart.

On your visualization, the legend titled “Rating” shows the color coding for the variable Rating. Lighter blues correspond to higher ratings and darker blues correspond to lower ratings.

According to your bar chart, the two company locations that produce the highest rated chocolate bars are Canada and France.

10. Scenario 2, continued

A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.

Assume your teammate shares the following code chunk:

ggplot(data = best_trimmed_flavors_df) +

     geom_bar(mapping = aes(x = Company)) +

What code chunk do you add to the third line to create wrap around facets of the variable Company?

  • facet_wrap(=Company)
  • facet_wrap(~Company) (Correct)
  • facet(Company)
  • facet_wrap(+Company)

Correct: You write the code chunk facet_wrap(~Company). In this code chunk:Correct: You write the code chunk facet_wrap(~Company). In this code chunk:

facet_wrap() is the function that lets you create wrap around facets of a variable.

Inside the parentheses of the facet_wrap() function, type a tilde symbol (~) followed by the name of the variable (Company).

11. Scenario 2, continued

Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.

Assume the first part of your code chunk is:

ggplot(data = trimmed_flavors_df) +

     geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to add the title Best Chocolates to your plot?

  • labs(title = “Best Chocolates”) (Correct)
  • labs(title <- “Best Chocolates”)
  • labs(“Best Chocolates” = title)
  • labs(“Best Chocolates”)

Correct: You write the code chunk labs(title = “Best Chocolates”). In this code chunk:

labs() is the function that lets you add a title to your plot.

In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks (“Best Chocolates”).

12. Scenario 2, continued

Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.

Assume your first two lines of code are:

ggplot(data = trimmed_flavors_df) +

     geom_point(mapping = aes(x = Cocoa.Percent, y = Rating))

What code chunk do you add to the third line to save your plot as a png file with chocolate as the file name?

  • ggsave(“chocolate”)
  • ggsave(“png.chocolate”)
  • ggsave(chocolate.png)
  • ggsave(“chocolate.png”) (Correct)

Correct: You write the code chunk ggsave(“chocolate.png”). In this code chunk:

Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (chocolate), then a period, then the type of file format (png), then a closing quotation mark.

13. Scenario 2, continued

As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.

You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. What do you use to document your work?

  • A data frame
  • A spreadsheet
  • A database
  • An R Markdown notebook (Correct)

Correct: You use an R Markdown notebook to document your work. The notebook lets you record and share every step of your analysis, lets your teammates run your code, and displays your visualizations.

14.  You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is flavors_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

  • read_csv(flavors_df <- “flavors_of_cacao.csv”)
  • read_csv(“flavors_of_cacao.csv”) <- flavors_df
  • flavors_df <- read_csv(“flavors_of_cacao.csv”) (Correct)
  • flavors_df + read_csv(“flavors_of_cacao.csv”)

Correct: The code chunk: flavors_df <- read_csv(“flavors_of_cacao.csv”) lets you create the data frame. In this code chunk:

flavors_df is the name of the data frame that will store the data.

<- is the assignment operator to assign values to the data frame.

read_csv() is the function that will import the data to the data frame.

“flavors_of_cacao.csv” is the file name that read.csv() function takes for its argument.

Liking our content? Then, don’t forget to ad us to your bookmarks so you can find us easily!

15. Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.

Assume the name of your data frame is flavors_df. What code chunk lets you get a glimpse of the contents of the data frame?

  • glimpse(flavors_df) (Correct)
  • glimpse %>% flavors_df
  • glimpse = flavors_df
  • glimpse <- flavors_df

Correct: You write the code chunk glimpse(flavors_df). In this code chunk:

glimpse() is the function that will give you a glimpse of the contents of the data frame, and give you high-level information like column names and the type of data contained in those columns.

flavors_df is the name of the data frame that the glimpse() function takes for its argument.

16. After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Company.Location. You decide to use the select() function to create a new data frame with only these three variables.

Assume the first part of your code is:

trimmed_flavors_df <- flavors_df %>%

Add the code chunk that lets you select the three variables.

select(Rating, Cocoa.Percent, Company.Location)

What company location appears in row 1 of your tibble?

  • Canada
  • Scotland
  • France (Correct)
  • Colombia

Correct: You add the code chunk select(Rating, Cocoa.Percent, Company.Location) to select the three variables. The correct code is trimmed_flavors_df <- flavors_df %>% select(Rating, Cocoa.Percent, Company.Location) . In this code chunk:

  • The select() function lets you select specific variables for your new data frame.
  • select() takes the names of the variables you want to choose as its argument: Rating, Cocoa.Percent, Company.Location.

The company location France appears in row 1 of your tibble.

17. Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals.

Assume your first line of code is:

ggplot(data = best_trimmed_flavors_df) +

You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Rating on the x-axis.

geom_bar(mapping = aes(x = Rating))

How many bars does your bar chart display?

geom_bar(mapping = aes(x = Rating))
  • 5
  • 6
  • 3
  • 2 (Correct)

Correct: You add the code chunk geom_bar(mapping = aes(x = Rating)) to create a bar chart with the variable Rating on the x-axis. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Rating)) . In this code chunk:

  • geom_bar() is the geom function that uses bars to create a bar chart.
  • Inside the parentheses of the aes() function, the code x = Rating maps the x aesthetic to the variable Rating.
  • Rating will appear on the x-axis of the plot.
  • By default, R will put a count of the variable Rating on the y-axis.

Your bar chart displays 2 bars.

18. Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.

Assume the first part of your code chunk is:

ggplot(data = trimmed_flavors_df) +

    geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to add the title Recommended Bars to your plot?

  • labs(title = Recommended Bars)
  • labs(“Recommended Bars”)
  • labs(title + “Recommended Bars”)
  • labs(title = “Recommended Bars”) (Correct)

Correct: You write the code chunk labs(title = “Recommended Bars”). In this code chunk:

labs() is the function that lets you add a title to your plot.

In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks (“Recommended Bars”).

20. Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.

Assume your first two lines of code are:

ggplot(data = trimmed_flavors_df) +

    geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to save your plot as a pdf file with “chocolate” as the file name?

  • ggsave(chocolate.pdf)
  • ggsave(“pdf.chocolate”)
  • ggsave(“chocolate.pdf”) (Correct)
  • ggsave(“chocolate.png”)

Correct: You add the code chunk ggsave(“chocolate.pdf”) to save your plot as a pdf file with “chocolate” as the file name. In this code chunk:

Inside the parentheses of the ggsave() function, type a quotation mark followed by the file name (chocolate), then a period, then the type of file format (pdf), then a closing quotation mark.

21. A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.

Assume your teammate shares the following code chunk:

ggplot(data = best_trimmed_flavors_df) +

geom_bar(mapping = aes(x = Cocoa.Percent)) +

What code chunk do you add to the third line to create wrap around facets of the variable Coca.Percent?

  • facet_wrap(~Cocoa.Percent) (Correct)
  • facet(=Cocoa.Percent)
  • facet_wrap(%>%Cocoa.Percent)
  • facet_wrap(Cocoa.Percent~)

Correct: You write the code chunk facet_wrap(~Cocoa.Percent) . In this code chunk:

  • facet_wrap() is the function that lets you create wrap around facets of a variable.
  • Inside the parentheses of the facet_wrap() function, type a tilde symbol (~) followed by the name of the variable (Cocoa.Percent)

22. You finish working with an R Markdown notebook and now you need to distribute your work. How can you export your analysis as a styled report?

  • Use the Contents Menu
  • Use markdown text
  • Use the Knit Button (CORRECT)
  • Use two hashtags

23. Fill in the blank: A delimiter is a character that indicates the beginning or end of _____.

  • a data item (CORRECT)
  • a header
  • a section
  • an analysis

24. A data analyst wants to change the default file format that gets exported by the Knit button in RStudio. Where should they change the output format?

  • In the Contents menu
  • In markdown text
  • In the YAML metadata (CORRECT)
  • In a code chunk

25.  What delimiter is used to indicate the YAML metadata in an R Markdown notebook?

  • (CORRECT)
  • “`{r}
  • “`
  • ###

26. What type of export document should you use while you are working and don’t need to worry about adding page breaks in the correct places?

  • HTML (CORRECT)
  • Word
  • PDF
  • YAML

27. A data analyst wants to create a shareable report of their analysis with documentation of their process and notes explaining their code to stakeholders. What tool can they use to generate this?

  • Filters
  • Dashboards
  • R Markdown (CORRECT)
  • Code chunks

28. Fill in the blank: A data analyst includes _____ in their R Markdown notebook so that they can refer to it directly in their explanation of their analysis.

  • markdown
  • documentation
  • YAML
  • inline code (CORRECT)

29. A data analyst wants to add a bulleted list to their R Markdown document. What symbol can they type to create this formatting?

  • Brackets
  • Asterisks (CORRECT)
  • Hashtags
  • Delimiters

30. A data analyst is working in a .rmd file and comes across the text “`{r analysis}. What is the purpose of the text “analysis”?

  • It is a label for the code chunk (CORRECT)
  • It changes the way the code gets exported
  • It runs the code in analysis mode
  • It alters the output file format of Knit

31. What does the — delimiter (three hyphens) indicate in an R Markdown notebook?

  • YAML metadata (CORRECT)
  • Bold text
  • Code chunk
  • Italic text

32. A data analyst is regularly exporting documents from a .rmd file and manually customizing the appearance of the document they give to stakeholders. What would allow them to automatically customize the appearance of the document?

  • A YAML header
  • A delimiter
  • An inline code snippet
  • A template (CORRECT)

33. A data analyst notices that their header is much smaller than they wanted it to be. What happened?

  • They have too few asterisks
  • They have too many hashtags (CORRECT)
  • They have too many asterisks
  • They have too few hashtags

34. A data analyst works with an .rmd file in RStudio and wants the ability to quickly find a code chunk using the label “analysis”. Which code example would allow the analyst to quickly access the code chunk using this label?

  • “`analysis{r}
  • “`{analysis r}
  • “`{r analysis} (CORRECT)
  • “`{r} analysis

35. A data analyst wants to make a word in their markdown stand out by making it bold. What characters should they surround the text with to achieve the bold style?

  • Angle brackets (<>)
  • Double asterisks (**) (CORRECT)
  • Double hashtag (##)
  • Single asterisk (*)

36. Scenario 1, questions 1-7

As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.

Your current client is Chocolate and Tea, an up-and-coming chain of cafes.

Image of a creatively designed sign titled chocolate and tea

The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.

Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.

They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.

Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.

You create a short document about the benefits of using R for the project and share the document with your team. You write that the benefits include R’s ability to quickly process lots of data and easily reproduce and share an analysis. What is another benefit of using R for the project?

  • Create high-quality visualizations (CORRECT)
  • Define a problem and ask the right questions
  • Automatically clean data
  • Choose a topic for analysis

Correct: Another benefit of using R for the project is R’s ability to create high-quality data visualizations.

37. Scenario 1, continued

Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load all the necessary libraries and packages. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.

You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is flavors_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

  • flavors_df + read_csv(“flavors_of_cacao.csv”)
  • flavors_df <- read_csv(“flavors_of_cacao.csv”) (CORRECT)
  • read_csv(“flavors_of_cacao.csv”) <- flavors_df
  • read_csv(flavors_df <- “flavors_of_cacao.csv”)

Correct: The code chunk: flavors_df <- read_csv(“flavors_of_cacao.csv”) lets you create the data frame. In this code chunk:

  • flavors_df is the name of the data frame that will store the data.
  • <- is the assignment operator to assign values to the data frame.
  • read_csv() is the function that will import the data to the data frame.
  • “flavors_of_cacao.csv” is the file name that read.csv() function takes for its argument.

38. After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Company. You decide to use the select() function to create a new data frame with only these three variables.

Assume the first part of your code is: 

trimmed_flavors_df <- flavors_df %>%

Add the code chunk that lets you select the three variables.

1

What company appears in row 1 of your tibble?

  • A. Morin (CORRECT)
  • Rogue
  • Videri
  • Soma

Correct: You add the code chunk select(Rating, Cocoa.Percent, Company) to select the three variables. The correct code is trimmed_flavors_df <- flavors_df %>% select(Rating, Cocoa.Percent, Company). In this code chunk:

  • The select() function lets you select specific variables for your new data frame. 
  • select() takes the names of the variables you want to choose as its argument: Rating, Cocoa.Percent, Company.

The company A. Morin appears in row 1 of your tibble.

39. Next, you select the basic statistics that can help your team better understand the ratings system in your data. 

Assume the first part of your code is:

trimmed_flavors_df %>%

You want to use the summarize() and mean() functions to find the mean rating for your data. Add the code chunk that lets you find the mean value for the variable Rating.

1

Run

Reset

What is the mean rating?

  • 3.185933 (CORRECT)
  • 3.995445
  • 4.701337
  • 4.230765

Correct: You add the code chunk summarize(mean(Rating)) to find the mean value for the variable Rating. The correct code is trimmed_flavors_df %>% summarize(mean(Rating)). In this code chunk:

  • The summarize() function lets you display summary statistics. You can use the summarize() function in combination with other functions such as mean(), sd(), and max() to calculate specific statistics. 
  • In this case, you use mean() to calculate the mean value for the variable Rating.

The mean rating is 3.185933.

40. After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.9 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar’s cocoa percent is greater than or equal to 75%. You decide to create a new data frame to find out which chocolate bars meet these two conditions.

Assume the first part of your code is:

best_trimmed_flavors_df <- trimmed_flavors_df %>% 

You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the data frame for chocolate bars that contain at least 75% cocoa and have a rating of at least 3.9 points.

1

What value for cocoa percent appears in row 1 of your tibble?

  • 80%
  • 88%
  • 75% (CORRECT)
  • 78%

Correct: The code chunk filter(Cocoa.Percent >= 75, Rating >= 3.9) lets you filter the data frame for chocolate bars that contain at least 75% cocoa and have a rating of at least 3.9 points. The correct code is best_trimmed_flavors_df <- trimmed_flavors_df %>% filter(Cocoa.Percent >= 75, Rating >= 3.9). In this code chunk: 

  • The filter() function lets you filter your data frame based on specific criteria. 
  • Cocoa.Percent and Rating refer to the variables you want to filter. 
  • The >= operator signifies “greater than or equal to.” 
  • The new data frame will show all the values of Cocoa.Percent greater than or equal to 75, and all the values of Rating greater than or equal to 3.9. 

The value 75% for cocoa percent appears in row 1 of your tibble.

41. Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals. 

Assume your first line of code is:

ggplot(data = best_trimmed_flavors_df) +

You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Rating on the x-axis.

1

How many bars does your bar chart display?

  • 6
  • 2 (CORRECT)
  • 5
  • 3

Correct: You add the code chunk geom_bar(mapping = aes(x = Rating)) to create a bar chart with the variable Rating on the x-axis. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Rating)). In this code chunk:

  • geom_bar() is the geom function that uses bars to create a bar chart. 
  • Inside the parentheses of the aes() function, the code x = Rating maps the x aesthetic to the variable Rating. 
  • Rating will appear on the x-axis of the plot. 
  • By default, R will put a count of the variable Rating on the y-axis.

Your bar chart displays 2 bars.

42. Your bar chart reveals the locations that produce the highest rated chocolate bars. To get a better idea of the specific rating for each location, you’d like to highlight each bar.

Assume that you are working with the code chunk:

ggplot(data = best_trimmed_flavors_df) +

  geom_bar(mapping = aes(x = Company.Location))

Add a code chunk to the second line of code to map the aesthetic color to the variable Rating.

NOTE: the three dots (…) indicate where to add the code chunk.

1

geom_bar(mapping = aes(x = Company.Location, …))

According to your bar chart, which two company locations produce the highest rated chocolate bars?

43. Scenario 2, continued

A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.

Assume your teammate shares the following code chunk:

ggplot(data = best_trimmed_flavors_df) +

  geom_bar(mapping = aes(x = Company)) +

What code chunk do you add to the third line to create wrap around facets of the variable Company? 

  • facet_wrap(=Company)
  • facet_wrap(~Company) (CORRECT)
  • facet(Company)
  • facet_wrap(+Company)

Correct: You write the code chunk facet_wrap(~Company). In this code chunk:

  • facet_wrap() is the function that lets you create wrap around facets of a variable.
  • Inside the parentheses of the facet_wrap() function, type a tilde symbol (~) followed by the name of the variable (Company).

44. Scenario 2, continued

Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.

Assume the first part of your code chunk is:

ggplot(data = trimmed_flavors_df) +

    geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to add the title Recommended Bars to your plot?

  • labs(title + “Recommended Bars”)
  • labs(“Recommended Bars”)
  • labs(title = Recommended Bars)
  • labs(title = “Recommended Bars”) (CORRECT)

Correct: You write the code chunk labs(title = “Recommended Bars”). In this code chunk:

  • labs() is the function that lets you add a title to your plot.
  • In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks (“Recommended Bars”).

45. Scenario 1, continued

Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.

You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is bars_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

  • bars_df + read_csv(“flavors_of_cacao.csv”)
  • read_csv(“flavors_of_cacao.csv”) + bars_df
  • bars_df <- read_csv(“flavors_of_cacao.csv”) (CORRECT)
  • bars_df %>% read_csv(“flavors_of_cacao.csv”)

Correct: The code chunk bars_df <- read_csv(“flavors_of_cacao.csv”) lets you create the data frame. In this code chunk:

  •  bars_df is the name of the data frame that will store the data. 
  • <- is the assignment operator to assign values to the data frame. 
  • read_csv() is the function that will import the data to the data frame. 
  • “flavors_of_cacao.csv” is the file name that read.csv() function takes for its argument.

46. Next, you select the basic statistics that can help your team better understand the ratings system in your data. 

Assume the first part of your code is:

trimmed_flavors_df %>%

You want to use the summarize() and sd() functions to find the standard deviation of the rating for your data. Add the code chunk that lets you find the standard deviation for the variable Rating.

1

 What is the standard deviation of the rating?

  • 0.3720475
  • 0.4780624 (CORRECT)
  • 0.4458434
  • 0.2951794

Correct: You add the code chunk summarize(sd(Rating)) to find the standard deviation for the variable Rating. The correct code is trimmed_flavors_df %>% summarize(sd(Rating)). In this code chunk:

  • The summarize() function lets you display summary statistics. You can use the summarize() function in combination with other functions such as mean(), max(), and min() to calculate specific statistics. 
  • In this case, you use sd() to calculate the standard deviation statistic for the variable Rating.

The standard deviation of the rating is 0.4780624.

47. After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.5 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar’s cocoa percent is greater than or equal to 70%. You decide to create a new data frame to find out which chocolate bars meet these two conditions. 

Assume the first part of your code is:

best_trimmed_flavors_df <- trimmed_flavors_df %>% 

You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the data frame for chocolate bars that contain at least 70% cocoa and have a rating of at least 3.5 points.

1

What rating appears in row 1 of your tibble?

  • 4.25
  • 4.00
  • 3.75
  • 3.50 (CORRECT)

Correct: The code chunk filter(Cocoa.Percent >= 70, Rating >= 3.5) lets you filter the data frame for chocolate bars that contain at least 70% cocoa and have a rating of at least 3.5 points. The correct code is best_trimmed_flavors_df <- trimmed_flavors_df %>% filter(Cocoa.Percent >= 70, Rating >= 3.5). In this code chunk: 

  • The filter() function lets you filter your data frame based on specific criteria. 
  • Cocoa.Percent and Rating refer to the variables you want to filter. 
  • The >= operator signifies “greater than or equal to.” 
  • The new data frame will show all the values of Cocoa.Percent greater than or equal to 70, and all the values of Rating greater than or equal to 3.5. 

The rating 3.50 appears in row 1 of your tibble.

48. Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals. 

Assume your first line of code is:

ggplot(data = best_trimmed_flavors_df) +

You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Rating on the x-axis.

1

How many bars does your bar chart display?

  • 5
  • 6
  • 3
  • 2 (CORRECT)

Correct: You add the code chunk geom_bar(mapping = aes(x = Rating)) to create a bar chart with the variable Rating on the x-axis. The correct code is ggplot(data = best_trimmed_flavors_df) + geom_bar(mapping = aes(x = Rating)). In this code chunk:

  • geom_bar() is the geom function that uses bars to create a bar chart. 
  • Inside the parentheses of the aes() function, the code x = Rating maps the x aesthetic to the variable Rating. 
  • Rating will appear on the x-axis of the plot. 
  • By default, R will put a count of the variable Rating on the y-axis.

Your bar chart displays 2 bars.

49. Scenario 2, continued

Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.

Assume the first part of your code chunk is:

ggplot(data = trimmed_flavors_df) +

    geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to add the title Best Chocolates to your plot?

  • labs(title <- “Best Chocolates”)
  • labs(“Best Chocolates” = title)
  • labs(“Best Chocolates”)
  • labs(title = “Best Chocolates”) (CORRECT)

Correct: You write the code chunk labs(title = “Best Chocolates”). In this code chunk:

  • labs() is the function that lets you add a title to your plot.
  • In the parentheses of the labs() function, write the word title, then an equals sign, then the specific text of the title in quotation marks (“Best Chocolates”).