COURSE 2: GET STARTED WITH PYTHON

Module 4: Data Structures in Python

GOOGLE ADVANCED DATA ANALYTICS PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Data Structures in Python

Now, you’ll explore fundamental data structures such as lists, tuples, dictionaries, sets, and arrays. Lastly, you’ll learn about two of the most widely used and important Python tools for advanced data analysis: NumPy and pandas.

Learning Objectives

  • Explain how to manipulate dataframes using techniques such as selecting and indexing, boolean masking, grouping and aggregating, and merging and joining
  • Describe the main features and methods of core pandas data structures such as dataframes
  • Describe the main features and methods of core NumPy data structures such as arrays and series
  • Define Python tools such as libraries, packages, modules, and global variables
  • Describe the main features and methods of built-in Python data structures such as lists, tuples, dictionaries, and sets.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: LISTS AND TUPLES

1. Lists and their contents are immutable, so their elements cannot be modified, added, or removed.

  • True
  • False (CORRECT)

Correct: Lists and their contents are mutable, so their elements can be modified, added, or removed. A list is a data structure that helps store and manipulate an ordered collection of items.

2. What Python method adds an element to the end of a list?

  • append() (CORRECT)
  • pop()
  • remove()
  • type()

Correct: Python’s append() method adds an element to the end of a list.

3. A data professional wants to instantiate a tuple. What Python elements can they use to do so? Select all that apply.

  • The insert() function
  • Square brackets
  • Parentheses (CORRECT)
  • The tuple() function (CORRECT)

Correct: A data professional can use parentheses or the tuple() function to instantiate a tuple. A tuple is an immutable sequence that can contain elements of any data type.

4. What Python technique formulaically creates a new list based on the values in an existing list?

  • List comprehension (CORRECT)
  • List nesting
  • List conversion
  • List sequencing

Correct: A list comprehension formulaically creates a new list based on the values in an existing list. A list comprehension functions like a for loop, but is a more efficient and elegant way to create a new list from an existing list.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DICTIONARIES AND SETS

1. Fill in the blank: In Python, a dictionary’s _____ must be immutable.

  • Order
  • Keys (CORRECT)
  • lists
  • sets

Correct: In Python, a dictionary’s keys must be immutable. Immutable keys include, but are not limited to, integers, floats, tuples, and strings. Lists, sets, and other dictionaries are not included in this category since they are mutable.

2. In Python, what does the items() method retrieve?\A dictionary’s sets

  • Only a dictionary’s values
  • Both a dictionary’s keys and values (CORRECT)
  • Only a dictionary’s keys

Correct: In Python, the items() method is used to retrieve both a dictionary’s keys and values.

3. A data professional is working with two Python sets. What function can they use to find all the elements from both sets?

  • union() (CORRECT)
  • symmetric_difference()
  • difference()
  • intersection()

Correct: When working with two Python sets, a data professional can use the union() function to find all the elements from both sets.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ARRAYS AND VECTORS WITH NUMPY

1. Python libraries and packages include which of the following features? Select all that apply.

  • Cells
  • Modules (CORRECT)
  • Reusable collections of code (CORRECT)
  • Documentation (CORRECT)

Correct: A Python library, or package, broadly refers to a reusable collection of code. Libraries and packages also contain related modules and documentation. You’ll often encounter the terms library and package used interchangeably.

2. What is the core data structure of NumPy?

  • List
  • Array (CORRECT)
  • Dictionary
  • Global variable

Correct: The array is the core data structure of NumPy. The data object itself is known as an n-dimensional array, or ndarray for short. An array can be multidimensional, and all its elements must be of the same data type.

3. A data professional wants to confirm the datatype of the contents of array x. How would they do this?

  • x.ndim
  • type(x)
  • x.dtype (CORRECT)
  • datatype(x)

Correct: dtype is a NumPy array attribute used to check the data type of the contents of an array.

PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DATAFRAMES WITH PANDAS

1. Fill in the blank: In pandas, a _____ is a one-dimensional, labeled array.

  • key
  • dataframe
  • series (CORRECT)
  • CSV file

Correct: A series is a one-dimensional, labeled array. Series objects are most often used to represent individual columns or rows of a dataframe.

2. In pandas, what is Boolean masking used for?

  •  Merging data in a dataframe
  • Adding data to a dataframe
  • Filtering data in a dataframe (CORRECT)
  • Deleting data from a dataframe

Correct: In pandas, Boolean masking is used for filtering data in a dataframe. Boolean masking is a filtering technique that overlays a Boolean grid onto a dataframe in order to select only the values in the dataframe that align with the True values of the grid.

3. What is a pandas method that groups rows of a dataframe together based on their values at one or more columns?

  • groupby() (CORRECT)
  • agg()
  • keys()
  • values()

Correct: groupby() is a pandas method that groups rows of a dataframe together based on their values at one or more columns. This allows further analysis of the groups.

4. A data professional wants to join two dataframes together. The dataframes contain identically formatted data that needs to be combined vertically. What pandas function can the data professional use to join the dataframes?

  • insert()
  • concat() (CORRECT)
  • type()
  • merge()

Correct: The data professional can use the concat() function to join the dataframes. concat() is a pandas function that combines data either by adding it horizontally as new columns for existing rows, or vertically as new rows for existing columns.

QUIZ: MODULE 4 CHALLENGE

1. Which of the following statements accurately describe Python lists? Select all that apply.

  • Lists are immutable.
  • Lists can be indexed and sliced. (CORRECT)
  • Lists can contain sequences of elements of any data type. (CORRECT)
  • Lists are mutable. (CORRECT)

Correct!

2. A data professional is working with a list named cities that contains data on global cities. What Python code can they use to add the string ‘Tokyo’ to the end of the list?

  • cities.append(‘Tokyo’) (CORRECT)
  • cities.insert(‘Tokyo’)
  • cities.pop(‘Tokyo’)
  • cities.import(‘Tokyo’)

Correct!

3. In Python, which of the following characters can a data professional use to instantiate a tuple?

  • { }
  • ( ) (CORRECT)
  • [ ]
  • < >

Correct!

4. Which of the following statements accurately describe Python dictionaries? Select all that apply.

  • Dictionaries are instantiated with quotation marks.
  • Dictionaries consist of string-tuple pairs. 
  • Dictionaries consist of collections of key-value pairs. (CORRECT)
  • Dictionaries are instantiated with the dict() function. (CORRECT)

Correct!

5. A data professional is working with a dictionary named employees that contains employee data for a healthcare company. What Python code can they use to retrieve only the dictionary’s values?

  • values.employees()
  • items.employees()
  • employees.items()
  • employees.values() (CORRECT)

Correct!

6. A data professional is working with two Python sets. What function can they use to find elements from both sets that are mutually not present in the other?

  • union()
  • difference()
  • intersection()
  • symmetric_difference() (CORRECT)

Correct!

7. Where are modules accessed in Python?

  • Within a package or library (CORRECT)
  • Within a dictionary
  • Within a set
  • Within a global variable

Correct!

8. Which of the following statements accurately describe NumPy arrays? Select all that apply.

  • Arrays are immutable.
  • Arrays contain elements of the same data type. (CORRECT)
  • Arrays are mutable. (CORRECT)
  • Arrays can be multidimensional. (CORRECT)

Correct!

9. A data professional is working with a pandas dataframe named sales that contains sales data for a retail website. They want to know the price of the most expensive item. What code can they use to calculate the maximum value of the Price column?

  • sales = ‘Price’.max()
  • sales.max().[Price]
  • sales.max().Price
  • sales[‘Price’].max() (CORRECT)

Correct!

10. A data professional is working with a pandas dataframe. They want to select a subset of rows and columns by index. What method can they use to do so?

  • merge()
  • concat()
  • loc[]
  • iloc[] (CORRECT)

Correct!

11. A data professional wants to merge two pandas dataframes. They want to join the data so all of the keys from both dataframes get included in the merge. What technique can they use to do so?

  • Outer join
  • Right join
  • Inner join
  • Left join (CORRECT)

Correct!

12. In Python, what data structure helps store and manipulate an ordered collection of items?

  • List (CORRECT)
  • Dictionary
  • Tuple
  • Set

Correct!

13. A data professional is working with a list named cities that contains data on global cities. The string ‘Houston’ is the third element in the list. What Python code can they use to remove the string ‘Houston’ from the list?

  • cities.pop(3)
  • cities.pop(1)
  • cities.pop(2) (CORRECT)
  • cities.pop(4)

Correct!

14. Which of the following statements accurately describe Python tuples? Select all that apply.

  • Tuples cannot be split into separate variables.
  • Tuples are immutable. (CORRECT)
  • Tuples can be split into separate variables. (CORRECT)
  • Tuples are sequences. (CORRECT)

Correct!

15. Fill in the blank: In Python, a dictionary’s keys must be _____.

  • equal
  • mutable
  • immutable (CORRECT)
  • identical

Correct!

16. How do global variables differ from other variables in Python? Select all that apply.

  • Global variables cannot be accessed from a script.
  • Global variables cannot be accessed from a program.
  • Global variables can be accessed from anywhere in a program. (CORRECT)
  • Global variables can be accessed from anywhere in a script. (CORRECT)

Correct!

17. Fill in the blank: A _____ NumPy array can be created from a list of lists, where each internal list is the same length.

  • four-dimensional
  • one-dimensional
  • three-dimensional
  • two-dimensional (CORRECT)

Correct!

18. A data professional is working with a pandas dataframe named sales that contains sales data for a retail website. They want to know the average price of an item. What code can they use to calculate the mean value of the Price column?

  • sales.mean().[Price]
  • sales.(Price).mean()
  • sales[‘Price’].mean() (CORRECT)
  • sales = mean().Price

Correct!

19. In pandas, what is the difference between the iloc[] and loc[] methods?

  • iloc[] selects dataframe rows and columns by name; loc[] selects dataframe rows and columns by index.
  • iloc[] selects dataframe rows and columns by index; loc[] selects dataframe rows and columns by name. (CORRECT)
  • iloc[] merges two dataframes horizontally; loc[] merges two dataframes vertically.
  • iloc[] merges two dataframes vertically; loc[] merges two dataframes horizontally.

Correct!

20. In Python, what types of data can tuples contain? Select all that apply.

  • Modules
  • Floats (CORRECT)
  • Strings (CORRECT)
  • Integers (CORRECT)

Correct!

21. Fill in the blank: In Python, _____ indicate where a list starts and ends.

  • square brackets (CORRECT)
  • parentheses
  • quotation marks
  • braces

Correct!

22. In Python, which of the following characters can a data professional use to instantiate a dictionary?

  • < >
  • ( )
  • [ ]
  • { } (CORRECT)

Correct!

23. A data professional is working with a dictionary named employees that contains employee data for a healthcare company. What Python code can they use to retrieve only the dictionary’s keys?

  • employees.keys() (CORRECT)
  • items.employees()
  • employees.items()
  • keys.employees()

Correct!

24. A data professional is working with two Python sets. What function can they use to find the elements present in one set, but not the other?

  • intersection()
  • difference() (CORRECT)
  • union()
  • symmetric_difference()

Correct!

25. A data professional is working with a NumPy array that has three rows and two columns. They want to change the data into two rows and three columns. What method can they use to do so?

  • reshape() (CORRECT)
  • agg()
  • groupby()
  • type()

Correct!

26. A data professional is working with two Python sets. What function can they use to find the elements that the two sets have in common?

  • symmetric_difference()
  • difference()
  • union()
  • intersection() (CORRECT)

Correct!

27. A data professional is working with a dictionary named employees that contains employee data for a healthcare company. What Python code can they use to retrieve both the dictionary’s keys and values?

  • employees.items() (CORRECT)
  • items.employees()
  • keys.employees()
  • employees.keys()

Correct!

28. A data professional wants to merge two pandas dataframes. They want to join the data so only the keys that are in both dataframes get included in the merge. What technique can they use to do so?

  • Left join (CORRECT)
  • Right join
  • Outer join
  • Inner join

Correct!

29. Fill in the blank: Mutability refers to the ability to _____ the internal state of a data structure.

  • calculate
  • change (CORRECT)
  • classify
  • evaluate

Correct: Mutability refers to the ability to change the internal state of a data structure. Immutability is the reverse, where a data structure or element’s values can never be altered or updated.

30. A tuple is an immutable sequence that can contain elements of any data type.

  • True (CORRECT)
  • False

Correct: A tuple is an immutable sequence that can contain elements of any data type.

31. Fill in the blank: A dictionary is a data structure that consists of a collection of _____ pairs.

  • keyword
  • string
  • key-value (CORRECT)
  • integer

Correct: A dictionary is a data structure that consists of a collection of key-value pairs. In a Python dictionary, looking up a key lets you access the data values associated with that key.

32. In Python, what type of elements does a set contain? Select all that apply.

  • Interchangeable
  • Ordered
  • Non-interchangeable (CORRECT)
  • Unordered (CORRECT)

Correct: In Python, a set is a data structure that contains only unordered, non-interchangeable elements.

33. Fill in the blank: In NumPy, _____ enables operations to be performed on multiple components of a data object at the same time.

  • Vectorization (CORRECT)
  • evaluation
  • classification
  • conversion

Correct: In NumPy, vectorization enables operations to be performed on multiple components of a data object at the same time. Data professionals often work with large datasets, and vectorized code helps them efficiently compute large quantities of data.

34. Fill in the blank: In pandas a dataframe is a _____-dimensional, labeled data structure.

  • Two (CORRECT)
  • one
  • three
  • zero]

Correct: In pandas a dataframe is a two-dimensional, labeled data structure. A dataframe is organized into rows and columns.