COURSE 1 – PREPARING DATA FOR ANALYSIS WITH MICROSOFT EXCEL

Module 3: Preparing Data for Analysis Using Functions

MICROSOFT POWER BI DATA ANALYST PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Last updated:

INTRODUCTION – Preparing data for analysis using functions

The module provides an in-depth introduction to a variety of commonly used functions designed to prepare Excel data for analysis in sophisticated tools like Power BI. It covers essential techniques and methodologies that ensure data is properly formatted, cleansed, and structured, facilitating seamless integration and enhanced analytical capabilities. By mastering these functions, users can optimize their data workflows, making it easier to perform complex data analysis and derive actionable insights using Power BI.

Learning Objectives

  • Use functions to prepare text data for effective analysis.
  • Use functions to create timeline information in a spreadsheet.
  • Generate new content using logical functions.

SELF-REVIEW: STANDARDIZING TEXT-BASED DATA

In the Standardizing text-based data exercise, you created calculations to standardize the data in the Adventure Works spreadsheet. You also used the functions TRIM, UPPER, PROPER, LEFT, MID, RIGHT, and CONCAT to clean the data. 

The results you produced should be similar to those in the following screenshot:

AD 4nXfwRxO8KyRWx hoDYPdC8ZOKvHnuMGTKJ9OG4cm8LwWcmskdgwJNqdGVVmLAqPvib79IOQ LVrtJvzHcFgXQlIna8aHTEOLEO90WuPg4PyDJgBSa13WbNn9CJC

Answer the questions that follow to test your understanding of the processes. Remember that you can refer to previous lesson items if required.

1. In your worksheet, you created formulas in column C that included the TRIM function and cell references from column B as arguments. The entries in column B contained a lot of unnecessary spaces though the entries contain only a single space between each word. What did the formulas remove?

  • The spaces before, after, and in the entries.
  • The spaces before and after the entries. (CORRECT)
  • The spaces before the entries only.

That’s correct! The TRIM function removes the additional spaces at the beginning and end of the entries. However, if there is a single space between words these are not removed. 

2. True or False: You created formulas in cells K2 to K200 to combine the content from column G and column l. You were then able to delete columns G and I immediately, as they were no longer required.

  • True
  • False (CORRECT)

That’s correct! Once the formulas were created in column K, it was necessary to Copy and then Paste values in column K to ensure that the formulas were replaced with the values generated.

3. In the CONCAT formula in K2, you added a third argument to indicate that the two words needed space between them. What character did you input on either side of the space in the formula?

  • A double quote. (CORRECT)
  • A parenthesis.
  • An exclamation mark.

That’s correct. A double quote character on either side of the space character indicated that the space was to be added as an additional character to the result.

4. Jamie wants to perform a time analysis of Adventure Works’ sales data. However, a time analysis isn’t possible because of how the date is currently formatted. What error might have caused this formatting issue?

  • The dates are typed with forward slashes.
  • The dates are typed as text. (CORRECT)
  •  The dates are typed as numbers.

That’s correct. For Excel to recognize dates as calendar items, they must be typed as numbers.

KNOWLEDGE CHECK: USING FUNCTIONS TO CLEAN OR STANDARDIZE TEXT

1. True or False: An Excel spreadsheet contains country names split over multiple columns. Cell A2 contains the word United. Cell B2 contains the word States. When executed, the following formula generates the result United States:

=CONCAT(A2,” “,B2)

  • True (CORRECT)
  • False

That’s correct! The CONCAT function formula brings together the contents of A2 and B2 and includes an additional argument that adds the space character to the result. The space character indicated is in double quotes to indicate that it is text to add.

2. One of the employees at Adventure Works has made some typing errors in a spreadsheet. A text entry in cell contains the following text:

aDVENTURE wORKS rESELLERS

Which function should you use in a formula to copy this entry so that it is in lowercase with a capital letter at the beginning of each word?

  • LOWER
  • PROPER (CORRECT)
  • UPPER

That’s correct! The PROPER function generates a result where the first letter of each word is in capitals, but the remainder of the word is in lowercase.

3. In the Adventure Works Reseller spreadsheet, the reseller names are listed in column D. Cell D2 contains the following text, which is left-aligned in the cell with no redundant spaces:

EastBike Shop

What is the result of this formula?

=MID(D2,5,4)

  • East
  • Shop
  • Bike (CORRECT)

That’s correct! The second argument, 5, asks Excel to move to the B. The argument of 4 then asks it to take that letter and the three letters that follow as its result. 

4. Cell A2 contains the following entry:

aceE6548.

What result would the following formula generate when applied to the above entry?

  • aCE6548
  • Ace6548.
  • ACEE6548 (CORRECT)

That’s correct! The UPPER function changes the letters into uppercase.

5. Cell E4 contains the city name North Miami Beach. What result would the following formula generate in your worksheet?

=RIGHT(E4,5)

  • Miami (CORRECT)
  • Beach
  • North

That’s correct. The RIGHT function in the formula would start at the right edge of the entry and count five characters to the left.

SELF-REVIEW: CALCULATING THE NUMBER OF WORKING DAYS REMAINING IN THE YEAR

In the exercise Calculating the number of working days remaining in the year, you created calculations to add new date and time information to a worksheet called USA Launch Dates in a workbook named Advertising Campaign USA. In the formulas you created, you used the functions TODAY, NETWORKDAYS, MONTH, and YEAR to calculate the required information.

Your final workbook should resemble the following screenshot:

AD 4nXfg3Wc6Uw2nXkSQsCe19GIjFiXIEPXgcGsTFZelWrQfk3d9Y Dj66PW7pfkFqy fBCFXvVp4 hYle04r2ipdJ2UhXxWB37QIJFZLY868VuT7saMG2pORj4Die4E2Qmb7t Z 7KHKABvJv1WO 1dfiQvSNgxFciBXJHueoF2Iwr7Iq2

Now it’s time to review your understanding of the tasks you completed by answering the following questions. Don’t forget that you can revisit the previous learning items to recap the process steps.

1. In the spreadsheet, you were asked to create a formula using the MONTH function. The formula was

:=MONTH(D5)

Cell D5 contained the entry 07/02/23. This date is in American format. What was the result of this calculation?

  • 2023
  • 7 (CORRECT)
  • 2

That’s correct! The MONTH function identifies and displays the month element of a date. In this case, the month is 7.

2. True or False: If you were to use a TODAY function in a formula in cell B1, the result would change each time the formulas in the worksheet recalculate.

  • True
  • False (CORRECT)

That’s correct! The TODAY function uses the system clock on the PC to track time. However, it only generates a new result when a full 24 hours have elapsed since the last updated value.

3. You created a formula using the NETWORKDAYS function. You added two arguments for the function, which were the start date, and the end date. What data did the formula automatically exclude from the result?

  • Any weekend date (CORRECT)
  • Any holiday date
  • Any weekend and holiday date

That’s correct! If the function contains only the start date and end date as arguments, it will apply its default setting, which is to exclude any date that falls on a weekend. It uses the country setting on the PC to determine what constitutes a weekend.

4. The date 05/30/23 has a serial number of 45076. What serial number will the date 05/31/23 have?

  • 45077 (CORRECT)
  • 45078
  • 45079

That’s correct! Excel increases the serial number by 1 for every 24-hour period.

5. How many arguments does the TODAY function require?

  • Two
  • None (CORRECT)
  • One

That’s correct! The TODAY function uses the computer’s internal calendar to generate its result, so it doesn’t require arguments.

KNOWLEDGE CHECK: DATE AND TIME FUNCTIONS

1. You have created a formula in your spreadsheet using the TODAY function. What must you include after the word TODAY in your formula?

  • An opening and closing parenthesis only. (CORRECT)
  • An opening and closing parenthesis and one space.
  • An opening and closing parenthesis and two spaces.

That’s correct! The TODAY function does not require any arguments. However, because of the way that functions are designed, it still requires an opening and closing parenthesis after the function name.

2. You have created a formula in your spreadsheet using the NOW function. By default, what will the formula display in its result?

  • The time only.
  • The date only.
  • The date and time. (CORRECT)

That’s correct! By default, the NOW function formula generates a result that includes the current date and time. You can customize this display by using the time format choices in the number section of the home ribbon.

3. You are working on a spreadsheet that contains three columns called Day, Month and Year. In another column, you would like to combine these entries so that it shows a complete date. Which function should you use to achieve this?

  • DATEDIF
  • DATE (CORRECT)
  • CONCAT

That’s correct! The DATE function allows you to combine three numbers into a single date. The result will also be a date. 

4. True or False: You can use the NETWORKDAYS.INTL function to calculate the number of working days between two dates while excluding national holidays and weekends because it has built-in knowledge of public holidays.

  • True
  • False (CORRECT)

That’s correct! The NETWORKDAYS.INTL function does not have any built-in knowledge of public or national holidays. These must be listed on the spreadsheet, and the range reference is used as a third argument in the formula. The NETWORKDAYS.INTL function also allows you to instruct Excel to ignore your country settings and use different days for the weekend. 

5. You have entered the following three dates in your spreadsheet in the month, day and year format. Which of these dates has the largest serial number?

  • 01/30/2023
  • 09/10/2025 (CORRECT)
  • 04/15/2020

That’s correct! Excel assigns each day a serial number and increments the number by 1 each day. A date in the future will have a larger serial number than a date in the past.

SELF-REVIEW: ADDING A DATA COLUMN USING THE IFS FUNCTION

In the exercise Adding a data column using the IFS function, you created calculations to generate two columns of new data in the Order Details spreadsheet in the workbook named Contoso Bikes. You generated this data using the IFS and IFS functions. You also used the function SUMIF to generate regional totals in the spreadsheet. 

The results you produced should be like those in the following screenshot:

AD 4nXcnKCOSxFT8Qw o HXYJAqQq6HPR7YLYih4NTc mCi aujA1U1RIoCi1OZT3mKE D2nmFaefYc0ONymhOtIEIOecrjErlqU5iddPi2HnKQtrrf8Mm0 Z8RXIglzqexDJG5aJkDXQCwVjapw6PYFE3UJ7GgMIlqi29609 rRGNFpjgLD

Now it’s time to review your understanding of the tasks you completed by answering the questions that follow. Don’t forget that you can revisit the previous learning items to recap the process steps.

1. You created an IF function formula in cell H7 that was designed to display the correct percentage discount if the amount in G7 was greater than $10,000. In what format did you enter the percentage in the Value if true section of the formula?

  • “10%”
  • 10
  • 10% (CORRECT)

That’s correct! Typing 10% in the “value if true” section of the formula means Excel displays this entry exactly if the logical test returns TRUE.

2. True or False: When you created a formula in cell L7 to check and display the delivery charge for the region, you used IFS rather than IF because you needed to run more than two tests but also create the most concise formula.

  • True (CORRECT)
  • False

That’s correct! A single IF function could only perform a test for two regions. In this situation, choosing to use IF would require you to create a more complex formula with multiple nested functions. On the other hand, because IFS can perform multiple sequential tests to allow for three regions, using it will allow you to create a more streamlined formula.

3. You created a formula in H2 which used SUMIF to obtain a total for Region A entries only. What did you need to specify as the first argument in your formula?

  • The criteria range (CORRECT)
  • The criteria entry
  • The SUM range

4. You have created a nested IF formula using three IF functions. How many closing parentheses do you need at the end of the formula?

  • Three (CORRECT)
  • One
  • Two

That’s correct. Each IF function needs its own opening and closing parentheses. You have used three IF functions in the formula, so there must be three closing parentheses at the end.

KNOWLEDGE CHECK: LOGICAL FUNCTIONS

1. Cell A2 of your worksheet contains a value of 250. What is the result of the following formula when added to your worksheet?

=IF(A2>300,10%,IF(A2>200,5%,0%))

  • 10%
  • 0%
  • 5% (CORRECT)

That’s correct. The logical test of the first IF fails, so Excel processes the second IF next. The second logical test returns a value of TRUE. So, the formula displays the value if true entry for the second IF.

2. You create a formula using the IFS function to test for a series of alphabet characters. When typing the criteria to test for, what symbols should you add around each text character?

  • Parentheses.
  • Single quotation marks.
  • Double quotation marks. (CORRECT)

That’s correct. Any piece of text included in the formula arguments needs to be enclosed within double quotation marks so that Excel recognizes that it is a piece of text.

3. In your worksheet, cell A2 contains a value of 100. Cell B2 contains a value of 200, and C2 contains a value of 400. What is the result of the following formula when added to your worksheet?

=IF(OR(A2>=200,B2>=200),”Result 1″,IF(C2>300,”Result 2″,”Result 3″))

  • Result 2
  • Result 1 (CORRECT)
  • Result 3

That’s correct! The first IF function uses the OR function to perform two logical tests. This means that only one of the tests needs to be TRUE for the IF to return the first value if true message. The formula doesn’t need to process the second IF function.

4. You create the following formula using the AVERAGEIF function: 

=AVERAGEIF(A2:A50,”Chicago”,C2:C50)

What does the first argument of this function represent?

  • The average range.
  • The criteria.
  • The criteria range. (CORRECT)

That’s correct! The first argument is called the criteria range. This is where the AVERAGEIF function identifies all occurrences of the criteria. 

5. In your worksheet, cell A2 contains the value 100. Cell B2 contains a value of 200, and C2 contains a value of 400. What is the result of the following formula when added to your worksheet?

=IFS(A2>200,”Rate 1″,B2>200,”Rate 2″,C2>200,”Rate 3″,TRUE,0)

  • Rate 1
  • Rate 2
  • Rate 3 (CORRECT)

That’s correct! The first two logical tests fail, but the third logical test returns a result of TRUE. So, the IFS function returns the third value if true message.

MODULE QUIZ: FORMULAS AND FUNCTIONS

1. True or False: In your worksheet, cell A2 contains the word United. B2 contains the word States. You add the following formula to your worksheet:

=CONCAT(A2,B2)

When executed, the result of this formula is:

United States

  • True
  • False (CORRECT)

That’s correct. The formula would not include a space between the words because there is no additional argument in the formula to request this. 

2. You are working on a spreadsheet that contains a column of customer names. You notice that there are a lot of extra spaces both before and after the entries. Which function could you use to tidy and standardize the customer name entries? 

  • RIGHT 
  • TRIM (CORRECT)
  • PROPER

That’s correct! The TRIM function removes redundant spaces before or after a cell entry. It does not remove the necessary spaces between words.

3. Some of the information in your worksheet has been typed in the wrong columns. The entry in C2 incorrectly reads as: 

32MainAvenueChicagoUSA

What is the result of the following formula which references this entry?

=MID(C2,3,10)

  • MainAvenue (CORRECT)
  • 32MainAven
  • ChicagoUSA

That’s correct! The second argument, 3, instructs Excel to move in 3 characters from the left. The third argument, 10, asks Excel to display the next 10 characters.

4. When typing time entries into a cell, what character can you use to separate the hours and minutes?

  • A colon. (CORRECT)
  • A comma.
  • A semi-colon.

That’s correct! The colon symbol is used to separate hours, minutes, and seconds in a time entry.

5. Cell D2 in your spreadsheet contains the date 05/30/23. The date is in MM/DD/YY format. What result does the following formula display?

=DAY(D2)

  • 5
  • 23
  • 30 (CORRECT)

That’s correct! The DAY function identifies the day element in a date and displays that element as its result.

6. You are working on a spreadsheet that contains employee information. Column A lists the date staff members started with the company. Column B lists the date they left the company. Cell A2 lists an employee’s start date as 05/30/16, and cell B2 lists their end date as 05/30/23.

What result does the following formula display?

=DATEDIF(A2,B2,”Y”)

  • 5
  • 6
  • 7 (CORRECT)

That’s correct. The DATEDIF function calculates the interval between two dates. The Y argument asks Excel to express the result as whole years.

7. True or False: Cell A2 contains a value of 100. B2 contains a value of 200. When executed, the following formula, which references these cells, displays a result of TRUE:

=AND(A2>=100,B2>=250)

  • True
  • False (CORRECT)

8. You must create an IF formula that checks the values in three cells and displays a “value if true” message of “target met” if any of the three values is over 1,000. Which function can you “nest” inside the IF function of your formula to complete this task?

  • An AND function.
  • An OR function. (CORRECT)
  • An IF function.

That’s correct! The OR function requires only one of the values in the three cells to be over 1,000 before it directs the IF function to display the “value if true” message of “target met”.

9. Cell A2 in your spreadsheet contains a value of 200. Which result does the following formula, which references this cell, display?

=IF(A2<>100,”FirstMessage”,IF(A2>300,”SecondMessage”,”Third Message”))

  • First Message (CORRECT)
  • Third Message
  • Second Message

That’s correct! The first logical test checks that the value in cell A2 is not equal to 100. The result of this first test is TRUE. So, the formula displays the message “First Message” and does not move on to the second IF.

10. In your customer details worksheet, you’ve noticed that cell A2 contains the entry mARY gOMEZ. What is the result of the following formula which references this cell?

  • =PROPER(A2)
  • MARY GOMEZ
  • mary gomez
  • Mary Gomez (CORRECT)

That’s correct! The PROPER function displays the text in A2 as normal text with a capital letter at the beginning of each word.

11. Cell A2 of your spreadsheet contains the date 05/30/23. Cell B2 contains the date 06/01/23. Both dates are in the MM/DD/YY format. Tip: If your machine is set to a different region, entering dates in the MM/DD/YY format will generate an error. Enter the dates in the format required for your region. For example, if you live in Europe the format would be DD/MM/YY.

What is the result of the following formula which references these cells?

=B2-A2

  • 3
  • 2 (CORRECT)
  • 1

That’s correct. When Excel subtracts the serial number for  05/30/23 from the serial number for 06/01/23, the answer is 2. This is because Excel does not include the serial number for 30/05/23 in the result.

12. True or False: Cell A2 contains a value of 100. B2 contains a value of 200. When executed, the following formula which references these cells, displays the result TRUE.

=OR(A2>=100,B2>=250)

  • True (CORRECT)
  • False

That’s correct! The OR function only requires one of the logical tests to return a value of TRUE before it generates an overall result of TRUE. In this case, the formula generates the result TRUE because the first logical test returns a value of TRUE.

13. You need to create an IF formula that runs a series of tests. If one test fails, the IF formula must move to the next test. Which function do you need to “nest” inside the IF function in this formula to complete this task?

  • An AND function.
  • An IF function. (CORRECT)
  • An OR function.

That’s correct! The tests need to be performed sequentially instead of at the same time. So a second IF function must be “nested” as the “value if false” entry for the first IF function. 

14. Cell A2 in your spreadsheet contains a value of 150. Which message does the following formula display when executed?

=IFS(A2=50,”First Message”,A2=100,”Second Message”,A2=150,”Third Message”,TRUE,”No Message”)

  • Second Message
  • First Message
  • No Message
  • Third Message (CORRECT)

That’s correct! The results of the first and second tests, A2=50 and A2=100, are both FALSE. However, the third logical test, A2=150, is TRUE. So, Excel displays “Third Message” as the formula result.

15. Columns A, B and C of your worksheet contain numeric entries. The columns are called Day, Month and Year. What type of data does the following formula generate when added to your worksheet?

=CONCAT(A2,B2,C2)

  • Numeric
  • Date
  • Text (CORRECT)

That’s correct! Even though the entries are numeric, the use of the CONCAT function in the formula transforms them into text.

16. Cell B2 of your worksheet contains the following entry:

mOUNTAIN Bike

What is the result of the following formula which references this cell?

=LOWER(B2)

  • mountain bike (CORRECT)
  • MOUNTAIN BIKE
  • Mountain Bike

That’s correct! The LOWER function changes all letters in the text entry to lowercase.

17. Some of the information in your worksheet has been typed in the wrong columns. The entry in C2 incorrectly reads as:

32MainAvenueChicagoUSA

What is the result of the following formula which references this entry?

  • =LEFT(C2,12)
  • 32Main
  • 32MainAvenue (CORRECT)
  • 32MainAvenueChicago

That’s correct! The LEFT function counts the number of characters from the left edge. In this case, the formula asks Excel to display the 12 characters starting from the left of the entry.

18. In your worksheet, cell A2 contains a value of 150. Cell B2 contains a value of 200. Cell C2 contains a value of 300.

The following formula, which contains three logical tests, has returned a result of FALSE. Which of these logical tests failed?

=AND(A2>100,B2>150,C2>350)

  • The first logical test.
  • The third logical test. (CORRECT)
  • The second logical test.

That’s correct. The values in A2 and B2 meet the criteria of the first two logical tests. However, the value in C2 does not meet the criteria in the third logical test. So, this component causes the formula to return a result of FALSE.

19. You must create an IF formula that checks the values in three cells. It must display a “value if true” message of “target met” if all three values exceed 1,000. Which function should you “nest” inside the IF function in this formula to complete this task?

  • An IF function.
  • An AND function. (CORRECT)
  • An OR function.

That’s correct! The AND function requires that all the values in the three cells be over 1,000 before it directs the IF function to display the “value if true” message of “target met”.

20. In your worksheet, cell A2 contains the word Super. Cell B2 contains the word Cycles. You add the following formula to your worksheet:

=CONCAT(A2,” “,B2,” “, “Inc.”)

What is the result of this formula?

  • SuperCyclesInc
  • Super Cycles Inc. (CORRECT)
  • Super CyclesInc

That’s correct! The two spaces between the words have been added as extra arguments surrounded by double quotes. The suffix Inc. has been added in the same way.

21. True or False: A NOW function formula only generates a new time result every 24 hours.

  • True
  • False (CORRECT)

That’s correct! Like any other formula in an Excel worksheet, a NOW function formula constantly recalculates. The NOW function displays the current time, so the time data changes every time the formula recalculates.

22. You need to create a formula in your worksheet which calculates the number of weekdays between two dates. Which one of the following functions can you use to complete this task?

  • DATEDIF
  • DATE
  • NETWORKDAYS (CORRECT)

That’s correct!

23. Cell A2 in your spreadsheet contains a value of 200. When executed, what message does the following formula display?

=IFS(A2=50,”First Message”,A2=100,”Second Message”,A2=150,”Third Message”,TRUE,”No Message”)

  • First Message
  • Second Message
  • Third Message
  • No Message (CORRECT)

That’s correct! The first, second, and third tests of A2=50, A2=100, and A2=150, respectively, all return a value of FALSE. So, Excel displays the final “value if false” entry of “Third Message” as the formula result.

CONCLUSION – Preparing data for analysis using functions

In conclusion, mastering the common functions introduced in this module significantly enhances one’s ability to prepare Excel data for comprehensive analysis in tools like Power BI. By ensuring data is correctly formatted, cleansed, and structured, users can seamlessly integrate their datasets and perform advanced analytical tasks. This proficiency not only streamlines data workflows but also empowers users to derive actionable insights with greater efficiency and accuracy, ultimately maximizing the value of their data analysis efforts.