# Course 2: Introduction to Data Analytics

## Coursera Study Guide

Click to Enroll in Coursera Meta Marketing Analytics Professional Certificate

## CONTENT

In week three, you’ll dig into how to clean and process data you’ve gathered using spreadsheets, SQL, and the Python Data Analytics Stack (Pandas).

## Learning Objectives

• Sort, query and structure data with spreadsheets
• Evaluate the limitations of spreadsheets as a means of cleaning data
• Sort, query and structure data with SQL
• Use Pandas DataFrames to clean and explore data
• Sort, query and structure data using Pandas, a Python library

## PRACTICE QUIZ: FILTERING DATA WITH GOOGLE SHEETS

### 1. If you have a list of numbers, and you’d like to arrange them from lowest to highest, you would:

• All of the above
• Visualize them
• Filter them
• Sort them (CORRECT)

Correct: That’s right! Sorting in a spreadsheet tool allows you to change how and in what order your data is presented.

### 2. True or false: Filtering data in a spreadsheet deletes the information that is no longer shown.

• True
• False (CORRECT)

Correct: That’s right! You can simply turn a filter off to show all of the data again.

### 3. Which function would you use to find the mean of a set of data in a spreadsheet?

• =MEAN()
• =AVERAGE() (CORRECT)
• =MEDIAN()

Correct: That’s right! Mean and average are often used interchangeably.

### 4. True or false: Once a spreadsheet has determined an appropriate visualization for your data, you can’t make changes to it.

• True
• False (CORRECT)

Correct: That’s right! You can make as many alterations and customizations as you need to a visualization.

## PRACTICE QUIZ: SQL SELECT STATEMENTS

### Which query will return the company, internet expenditure, and quarter in descending order of internet expenditure?

`````` SELECT Company, Internet
ORDER BY Internet DESC
``````
`````` SELECT Company, Internet, Quarter
ORDER BY Quarter DESC
``````
`````` SELECT Company, Internet, Quarter
``````
`````` SELECT Company, Internet, Quarter (CORRECT)
ORDER BY Internet DESC
``````

(CORRECT)

Correct: Good Job!. Remember that the columns you enter after the SELECT clause will be returned in the return-set. So Company, Internet, and Quarter data values will be in the return-set. The ORDER BY arranges these rows relative to the values in the Internet column with the largest value in the first row. (Fourth Option)

### 2. Which of the following queries will return the unique companies found in the table and arrange them in alphabetical order?

`````` SELECT Company
ORDER BY Company
``````
``````  SELECT DISTINCT(Company)
ORDER BY Company DESC
``````
``````  SELECT DISTINCT(Company)
``````
``````  SELECT DISTINCT(Company) (CORRECT)
ORDER BY Company
``````

(CORRECT)

Correct: Good Job! The DISTINCT addition after the SELECT clause will ignore duplicate values in the return-set which is what is meant by the word distinct. We also order by the Company so that the returned values are in alphabetical order.

### 3. What will the following query have in its return-set?

`````` SELECT Company, Quarter, TV
ORDER BY TV
LIMIT 4
``````
• The return-set will contain only the four companies that have spent the least amount of TV advertising.
• The return-set will have company, quarter, and TV expenditure values for the four rows that have the least TV advertising values. (CORRECT)
• The return-set will contain all the values found in the Company, Quarter, and TV columns.
• The return-set will contain the company, quarter, and TV values for the four rows that have the highest values in the TV column.

Correct: Good Job! Since Company, Quarter, and TV were all selected, these columns will show up in the return set arranged from smallest TV value to largest. Yet, only the first four rows will end up being in the return-set.

## PRACTICE QUIZ: SQL FILTERING + GROUPING

### Which query will return the quarters for which Calla & Ivy had a print advertising expenditure less than \$400?

`````` SELECT Quarter
WHERE Print = 400 AND Company = ‘Calla & Ivy’
``````
`````` SELECT Quarter
WHERE Print < 400
``````
`````` SELECT Print
WHERE Quarter < 400 AND Company = ‘Calla & Ivy’
``````
`````` SELECT Quarter (CORRECT)
WHERE Print < 400 AND Company = ‘Calla & Ivy’
``````

(CORRECT)

Correct: Well done! We select the column Quarter because those are the data values that we want returned. But we use the WHERE to restrict the return-set to only include values for Print that are less than 400 for the company Calla & Ivy.

### 2. Which query will get us the total TV expenditure in each quarter across all companies?

`````` SELECT Quarter, MAX(TV)
GROUP BY Quarter
``````
`````` SELECT Quarter, TV
GROUP BY Quarter
``````
`````` SELECT Quarter, SUM(TV)
GROUP BY TV
``````
`````` SELECT Quarter, SUM(TV) (CORRECT)
GROUP BY Quarter
``````

(CORRECT)

Correct: Good Job! This query will return the total amount spent on TV advertising in each quarter.

### 3. What will the following query have returned?

`````` SELECT AVG(Internet)
WHERE Company = ‘Calla & Ivy’
GROUP BY Quarter
``````
• The query will return Calla & Ivy’s average internet expenditure across all quarters.
• The query will return the internet expenditure for an average quarter at Calla & Ivy.
• The query will return Calla & Ivy’s total internet expenditure for each quarter.
• The query will return Calla & Ivy’s average internet expenditure for each quarter. (CORRECT)

Correct: Good job! This query will return the average internet expenditures for Calla & Ivy each quarter.

## PRACTICE QUIZ: FINDING AND REMOVING DUPLICATE RECORDS

### 1. Given that you have a Pandas DataFrame `df`, how would you export the DataFrame to the CSV file “my_data.csv”?

• df.to_csv(“my_data.csv”) (CORRECT)
• df.to_csv = “my_data.csv”
• df.save(“my_data.csv”)
• df[“my_data.csv”]

Correct: That’s right! To export the DataFrame to a CSV file, you would use `.to_csv()` with the name of the file as a parameter.

### What code would get you just the data for `Column 2`?

• df = ‘Column 2
• df
• df[‘Column 2’] (CORRECT)
• df(‘Column 2’)

Correct: That’s right! We can get just one column of a DataFrame using the square-bracket notation, similar to how we used keys in Python dictionaries.

### 3. What code would get you how many rows and columns are in a DataFrame `df`?

• df.shape()
• df.size
• df.size()
• df.shape (CORRECT)

Correct: That’s right! The code `df.shape` will get us the rows and columns of the DataFrame since `shape` is an attribute of the DataFrame.

## GRADED QUIZ: PANDAS AND SQL ASSESSMENT

### What code would you write to get all the rows where `subscriptionTier` has the word “Free”?

• df[ df[subscriptionTier].str.contains(“Free”) ] (CORRECT)
• df[subscriptionTier].contains(“Free”)
• df[ df[subscriptionTier].contains(“Free”) ]
• df[subscriptionTier].str.contains(“Free”)

Correct: Correct! We use square brackets and the .str.contains() method to filter the data.

### What code would I write to find out how many customers there are in each industry?

• df.groupby(‘industry’)[‘customerID’].count() (CORRECT)
• df[‘customerID’].groupby(‘industry’).count()
• df.groupby(‘customerID’).count()

Correct: That’s correct! You first group by `industry` and then using just the `customerID` column, we get the count.

### 4. You can find the list of columns with `df.columns()`

• T​rue
• False (CORRECT)

Correct: Correct! DataFrames have an attribute `columns` that can be accessed using “dot notation” to get a list of the columns’ names in the DataFrame. You don’t use parentheses.

### What code would you write to get all the rows where `industry` starts with the letters “Re”?

• df[ df[‘industry’].contains(“Re”) ]
• df[ df[‘industry’].str.contains(“Re”) ]
• df[ df[‘industry’].str.startswith(“Re”) ] (CORRECT)
• df[ df[‘industry’].startswith(“Re”) ]

Correct: Correct! We use square brackets and the .str.startswith() method to filter the data.

### What code would I write to find out how many customers there are that have the security add on?

• df.groupby(‘customerID’).count()

Correct: That’s correct! You first group by `addOn_security` and then using just the `customerID` column, we get the count.

### 7. Which code would you write to filter a DataFrame `df` so that you keep only the first instance of a duplicate?

• df[ ~df.duplicated() ] (CORRECT)
• df[ df.duplicated() ]
• df[ df.duplicated(keep=False) ]
• df[ ~df.duplicated(keep=False) ]

Correct: Correct!

### How would you interpret this output?

`````` df[ 'totalCosts' ].quantile(0.6)
4.99
``````
• 60% of the data has a totalCost of 4.99 or more
• 60% of the data has a value of 4.99 or less for the totalCost column (CORRECT)
• 60% of the data has a totalCost of exactly 4.99
• 40% of the data has a totalCost of 4.99 or less

Correct: Correct! The quantile method returns the cutoff value for the percentage given.

### What code would you write to get just the single column `totalCost`?

• df[‘totalCost’] (CORRECT)
• df.column[‘totalCost’]
• df.column(‘totalCost’)
• df(‘totalCost’)

Correct: Correct! We use square brackets and a string representing the column name to get just one column of the DataFrame.

### What code would I write to find out how many customers there are in each subscription tier?

• df.groupby(‘customerID’).count()
• df[‘customerID’].groupby(‘subscriptionTier’).count()
• df.groupby(‘subscriptionTier’)[‘customerID’].count() (CORRECT)
• df.groupby(‘customerID’)[‘subscriptionTier’].count()

Correct: That’s correct! You first group by `‘subscriptionTier’` and then using just the `customerID` column, we get the count.

### 11. You are given the following data loaded as a DataFrame `df` using Pandas. We know that the `customerID` column has a unique value (representing a different customer).

`````` df[ 'totalCosts' ].quantile(0.8)
8.99
``````
• 80% of the data has a value of 8.99 or less for the totalCost column (CORRECT)
• 20% of the data has a totalCost of exactly 8.99
• 80% of the data has a totalCost of 8.99 or more
• 20% of the data has a totalCost of 8.99 or less

Correct: Correct! The quantile method returns the cutoff value for the percentage given.

### What code would you write to get all the rows where `industry` starts with the letter “R”?

• df[ df[‘industry’].str.startswith(“R”) ] (CORRECT)
• df[‘industry’].str.starts(“R”)
• df[ df[‘industry’].startswith(“R”) ]
• df[ df[‘industry’].starts(“R”) ]

Correct: Correct! We use square brackets and the .str.startswith() method to filter the data.

### 13. Which code would you write to filter a DataFrame `df` so that you only have values that had no duplicate rows?

• df[ ~df.duplicated(keep=False) ]
• df[ df.duplicated(keep=’False’) ]
• df[ ~df.duplicated(keep=’last’) ] (CORRECT)
• df[ df.duplicated(keep=’last’) ]

Correct:Correct!

### What code would I write to find out how many customers there are in each subscription tier?

• df.value_counts()[‘subscriptionTier’]
• df[‘subscriptionTier’].values
• df[‘subscriptionTier’].value_counts() (CORRECT)
• len(df[‘subscriptionTier’])

Correct: That’s correct! You first get the `subscriptionTier` column and then using the `value_counts()` method we can get the number of customers there are in each group.

### What code would you write to get all the rows where `totalCost` is less than \$6?

• df[‘totalCost’] < 6.00
• df[‘totalCost’].less_than(6.00)
• df[ df[‘totalCost’].less_than(6.00) ]
• df[ df[‘totalCost’] < 6.00 ] (CORRECT)

Correct: Correct! We use square brackets to enclose a mask which uses conditional statements to filter the data.

### 16. Which ways can you filter data using Google Sheets? Select all that apply.

• By Font Size
• By Color (CORRECT)
• By Condition (CORRECT)
• By Values (CORRECT)

Correct: Correct! This feature is useful if you’ve already color coded your dataset, but otherwise requires an additional step.

Correct: Correct! You saw filtering by condition using “text contains” conditions, “text does not contain” conditions, and “is greater than or equal to” conditions.

Correct: Correct! You can select specific column values which you wish to exclude by clicking them from the filter menu. You will see the check next to these values removed in the menu. Then click OK to apply the filter.

### 17. What will the following code give in the return set?

`````` SELECT DISTINCT company

ORDER BY company DESC

LIMIT 5
``````
• It returns the list of all companies in reverse alphabetical order.
• It returns the first five entries from the list of companies in alphabetical order.
• It returns the first five entries from the list of companies in reverse alphabetical order. (CORRECT)
• It returns the list of all companies in alphabetical order.

Correct: Excellent! This SQL query creates a list of the unique companies, orders them in reverse-alphabetical order, then returns the top five from that list.

### What are the resulting data for the SQL query:

`````` SELECT company, quarter, internet
WHERE internet >= 500
``````
• The return-set will consist of the Company, Quarter, and Internet column values for only those rows where the internet value is 500 or greater. (CORRECT)
• The return-set will consist of the Internet column values for only those rows where the internet value equals 500.
• The return-set will consist of the Company, Quarter, and Internet column values for only those rows where the internet value is less than 500.
• The return-set will consist of the Company and Internet column values for only those rows where the internet value is 500 or greater.

Correct: Excellent! This SQL command returns the data values in the columns for every row that has a value of 500 or more in the internet column.

### 19. The tenure column records how many months an individual has been a customer. Which code blocks would correctly filter the data to individuals who had been a customer for a year or more? Select all that apply.

• 1 mask = df[ ‘Tenure’ ] == 12
• 1 df[df[ ‘Tenure’] >= 12] (CORRECT)
• 1 mask = df[ ‘Tenure’ ] > 11 (CORRECT)
• 1 df[df[ ‘Tenure’] >= 12] (CORRECT)

Correct: Correct! This code filters the dataset in one line. Inside the square brackets is the mask, or conditional comparison.

Correct: Correct! This code block does subset the data to customers who have subscribed for a year or more. Using >= 12 like options A and D might be clearer to read though.

Correct: Correct! In the first line, you create a conditional mask based on the Tenure column. Then you apply this mask to filter the DataFrame in the second line.

### 20. Which code would correctly remove all duplicate values? Select all that apply.

• 1 deduplicated = df.filter(df.duplicated())
• 1 deduplicated = df[df.duplicated()]
• 1 mask = ~df.duplicated() (CORRECT)
• 1 deduplicated = df[~df.duplicated()] (CORRECT)

Correct: Correct! This code will remove all duplicate records. The first record from every duplicate will be kept.

Correct: Correct! This code will remove all duplicate records. The first record from every duplicate will be kept, and it is written all as one line.

### 21. Which code correctly groups customers by subscription tier and returns the total monthly sales from each tier group?

• df.groupby([‘subscriptionTier’, ‘totalCost’]).sum()
• df.groupby(‘subscriptionTier’)[‘totalCost’].mean()
• df.groupby(‘totalCost’)[‘subscriptionTier’].sum()
• df.groupby(‘subscriptionTier’)[‘totalCost’].sum() (CORRECT)

Correct: Correct! This code groups the customer dataset by subscriptionTier and calculates the total sales for Focus Vu customers in each of these subscription tiers.

### 22. How would you interpret this output? Select all that apply.

``````1 df['totalCosts'].quantile(.75)
2 7.99
``````
• 25% of the data has a totalCosts of 7.99 or less.
• 75% of the data has a totalCost of 7.99 or more.
• 25% of the data has a totalCosts of 7.99 or more. (CORRECT)
• 75% of the data has a value of 7.99 or less for the totalCosts column. (CORRECT)

Correct: Correct! The quantile method tells you that 75% of the data has a totalCost of 7.99 or less. Therefore the remaining 25% has a totalCost of 7.99 or more.

Correct: Correct! The quantile method returns the cutoff value for the percentage given. .75 is 75% So this query tells us that 75% of the data has a totalCosts of 7.99 or less.

#### Subscribe to our site

Get new content delivered directly to your inbox.

## Quiztudy Top Courses

Liking our content? Then, don’t forget to ad us to your BOOKMARKS so you can find us easily!