course 2: USING PYTHON TO INTERACT WITH THE OPERATING SYSTEM

Module 3: Regular Expressions

GOOGLE IT AUTOMATION WITH PYTHON PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Last updated:

INTRODUCTION – Regular Expressions

In this module, you’ll learn about what a regular expression is and why you would use one. We’ll dive into the basics of regular expressions and give examples of wildcards, repetition qualifiers, escapare characters, and more. Next up, we’ll explore advanced regular expressions and deep dive on repetition qualifiers. You’ll tackle new exercises like capturing groups and extracting PIDs using regexes. Finally, we’ll provide a study guide to serve as your go-to guide for regular expressions.

Learning Objectives

  • Define what a regular expression is and describe why it is useful
  • Use basic regular expressions including simple matching, wildcard, and character classes
  • Explain repetition qualifiers
  • Use advanced regular expressions

PRACTICE QUIZ: REGULAR EXPRESSIONS

1. When using regular expressions, which of the following expressions uses a reserved character that can represent any single character?

  • re.findall(^un, text)
  • re.findall(f.n, text) (CORRECT)
  • re.findall(f*n, text)
  • re.findall(fu$, text)

Nailed it! The dot (.) represents any single character.

2. Which of the following is NOT a function of the Python regex module?

  • re.search()
  • re.match()
  • re.grep() (CORRECT)
  • re.findall()

Keep it up! The grep command utilizes regular expressions on Linux, but is not a part of the standard re Python module.

3. The circumflex [^] and the dollar sign [$] are anchor characters. What do these anchor characters do in regex?

  • Match the start and end of a word.
  • Match the start and end of a line (CORRECT)
  • Exclude everything between two anchor characters
  • Represent any number and any letter character, respectively

Nailed it! the circumflex and the dollar sign specifically match the start and end of a line.

4. When using regex, some characters represent particular types of characters. Some examples are the dollar sign, the circumflex, and the dot wildcard. What are these characters collectively known as?

  • Special characters (CORRECT)
  • Anchor characters
  • Literal characters
  • Wildcard characters

Awesome! Special characters, sometimes called meta characters, give special meaning to the regular expression search syntax.

5. What is grep?

  • An operating system
  • A command for parsing strings in Python
  • A command-line regex tool (CORRECT)
  • A type of special character

Right on! The grep command is used to scan files for a string of characters occurring that fits a specified sequence.

6. Which of the following demonstrates how regex (regular expressions) might be used?

  • Recognize an image
  • Calculate Pi
  • Find strings of text that match a pattern (CORRECT)
  • Multiply and divide arrays

Awesome! Regex is a search query that matches the regular expression pattern you’ve specified.

7. Rather than using the index() function of the string module, we can use regular expressions, which are more flexible. After importing the regular expression module re, what regex function might be used instead of standard methods?

  • re.search() (CORRECT)
  • re.index()
  • re.pid()
  • re.regex()

Great job! Using the re module provides more robust solutions, not limited to just re.search().

8. Using the terminal, which of the following commands will correctly use grep to find the words “sling” and “sting” (assuming they are in our file, file.txt)?

  • user@ubuntu:~$ grep s+ing /usr/file.txt
  • user@ubuntu:~$ grep sting+sling /usr/file.txt
  • user@ubuntu:~$ grep s.ing /usr/file.txt (CORRECT)
  • user@ubuntu:~$ grep(s.ing) /usr/file.txt

Nice work! In regex, a dot is a wildcard, so it can represent any character. This command will print both “sting” and “sling”, if they are in the file.

PRACTICE QUIZ: BASIC REGULAR EXPRESSIONS 

1. The check_web_address function checks if the text passed qualifies as a top-level web address, meaning that it contains alphanumeric characters (which includes letters, numbers, and underscores), as well as periods, dashes, and a plus sign, followed by a period and a character-only top-level domain such as “.com”, “.info”, “.edu”, etc. Fill in the regular expression to do that, using escape characters, wildcards, repetition qualifiers, beginning and end-of-line characters, and character classes.

EzZXE eWt6SNAw0Gq47Um14D f6ODsCvCeeGKMqoyoJXslcFDXfmD91Bg4FdwAvWsqOeUdZ0yhWz9vYXv GeLXDtJlckvzxbAbhLUlKORrQD HFOxSDPFFcoJtOIUMzOgU5oHfNc

  • Answer
import re


def check_web_address(text):
    pattern = r"^[a-zA-Z0-9_.+-]+(?:\.[a-zA-Z0-9-]+)*\.[a-zA-Z]{2,}$"
    result = re.search(pattern, text)
    return result is not None


print(check_web_address("gmail.com"))             # True
print(check_web_address("www@google"))            # False
print(check_web_address("www.Coursera.org"))       # True
print(check_web_address("web-address.com/homepage"))  # False
print(check_web_address("My_Favorite-Blog.US"))    # True

Right on! No bogus web address will get past you!

2. The check_time function checks for the time format of a 12-hour clock, as follows: the hour is between 1 and 12, with no leading zero, followed by a colon, then minutes between 00 and 59, then an optional space, and then AM or PM, in upper or lower case. Fill in the regular expression to do that. How many of the concepts that you just learned can you use here?

ExwSNAttK6DjAfEmEg8jIY0goam8ouiiENX0nCFZL

  • Answer
import re
def check_time(text):
    pattern = r"^(1[0-2]|0?[1-9]):[0-5][0-9]\s?(am|pm|AM|PM)$"
    result = re.search(pattern, text)
    return result is not None


print(check_time("12:45pm"))    # True
print(check_time("9:59 AM"))    # True
print(check_time("6:60am"))      # False
print(check_time("five o'clock"))  # False

You nailed it! It’s “time” to celebrate!

3 What does the “r” before the pattern string in re.search(r”Py.*n”, sample.txt) indicate?

  • Raw strings (CORRECT)
  • Regex 
  • Repeat
  • Result

Right on! “Raw” strings just means the Python interpreter won’t try to interpret any special characters and, instead, will just pass the string to the function as it is.

4. What does the plus character [+] do in regex?

  • Matches plus sign characters
  • Matches one or more occurrences of the character before it (CORRECT)
  • Matches the end of a string
  • Matches the character before the  [+] only if there is more than one

Awesome! The plus character [+], matches one or more occurrences of the character that comes before it. 

5. Fill in the code to check if the text passed contains the vowels a, e and i, with exactly one occurrence of any other character in between.

Yxg13UfA0bP93HDyBXTWMN3R7wsInGtdnZIug7DBBA4 Iy2Npt4esPhmvsi H2Z1qzVH2sEoIGzzfpChkOifCwfKpdNT68IQPpQ8eN kixYplTrxnnGj11NuiFCYt3FhbR1pEyUfmaDK tHRJ LAyQ

  • Answer
import re


def check_aei(text):
    result = re.search(r"a.e.i", text)
    return result is not None


print(check_aei("academia"))    # True
print(check_aei("aerial"))       # False
print(check_aei("paramedic"))    # True

Great work! You’ve written your first regular expression!

6. Fill in the code to check if the text passed contains punctuation symbols: commas, periods, colons, semicolons, question marks, and exclamation points.

QqvqexhcwoqKIHd1SX3UZws 7AWifRp6e7ZJcB5MPmAeA7GZnqeMrrXmetsZ1t VflwYGr

  • Answer
import re
def check_punctuation(text):
    result = re.search(r"[,.;:?!]+", text)
    return result is not None


print(check_punctuation("This is a sentence that ends with a period."))  # True
print(check_punctuation("This is a sentence fragment without a period"))  # False
print(check_punctuation("Aren't regular expressions awesome?"))  # True
print(check_punctuation("Wow! We're really picking up some steam now!"))  # True
print(check_punctuation("End of the line"))  # False

Right on! You’re seeing the flexibility of character classes in regular expressions!

7. The repeating_letter_a function checks if the text passed includes the letter “a” (lowercase or uppercase) at least twice. For example, repeating_letter_a(“banana”) is True, while repeating_letter_a(“pineapple”) is False. Fill in the code to make this work.

wsgWdTbbA5DU7qtwoulv Cidp4ZR7QdZ10 StyCk73v9ZUE sJIiLQfSGSfL4TcSPIYQRB d989L3ameFMo8ze xFOiK6TJHeos 7nz9ktw Jn lk 3Y4rLxkGPFWVwUnDg65vv PtDdABQfMvPlsQ

  • Answer
import re
def repeating_letter_a(text):
    result = re.search(r"[Aa].*[Aa]", text)
    return result is not None


print(repeating_letter_a("banana"))          # True
print(repeating_letter_a("pineapple"))        # False
print(repeating_letter_a("Animal Kingdom"))   # True
print(repeating_letter_a("A is for apple"))   # True

You get an A! See how handy the repetition qualifiers can be, when we’re working with lots of different text!

8. Fill in the code to check if the text passed has at least 2 groups of alphanumeric characters (including letters, numbers, and underscores) separated by one or more whitespace characters.

11QWS kX53TeeqIgJVve lxeTnyNczQUf mnzMM7RdgLipmazReZSAJOf0TqB7sdXpzeZkXOhc8sTMwxQUCTVBUSs7 S7lrlFULKuwllWV4TzLGZ 53igJupuI

  • Answer
import re
def check_character_groups(text):
    result = re.search(r"\w+\s+\w+", text)
    return result is not None


print(check_character_groups("One")) # False
print(check_character_groups("123  Ready Set GO")) # True
print(check_character_groups("username user_01")) # True
print(check_character_groups("shopping_list: milk, bread, eggs.")) # False

You got it! There’s no escaping your regular expression expertise!

9. Fill in the code to check if the text passed looks like a standard sentence, meaning that it starts with an uppercase letter, followed by at least some lowercase letters or a space, and ends with a period, question mark, or exclamation point.

5n659MwDaSFZnEwY9MiBABrmjJxlg8KpXzth745sgijhtER75Ckt7RT9LV7HAPwIlwhoKT54IvDG6zdeTKxSwExJMjTbzu4CW fKy P

  • Answer
import re
def check_sentence(text):
    result = re.search(r"^[A-Z][a-z\s]+[.!?]$", text)
    return result is not None


print(check_sentence("Is this is a sentence?")) # True
print(check_sentence("is this is a sentence?")) # False
print(check_sentence("Hello")) # False
print(check_sentence("1-2-3-GO!")) # False
print(check_sentence("A star is born.")) # True

Awesome! You’re becoming a regular “regular expression” writer!

PRACTICE QUIZ: ADVANCED REGULAR EXPRESSIONS

1. We’re working with a CSV file, which contains employee information. Each record has a name field, followed by a phone number field, and a role field. The phone number field contains U.S. phone numbers, and needs to be modified to the international format, with “+1-” in front of the phone number. Fill in the regular expression, using groups, to use the transform_record function to do that.

nujsB96KOsrRMnW WSMin2OneaiaMWSnxToe ygAv46iWjOWJqMDECMsBAA5 MMMDNEpu4iVm1gUhqq0o6Yb4PF9

  • Answer
import re
def transform_record(record):
  new_record = re.sub(r"(\d{3}-\d{3}-\d{4})", r"+1-\1",record)
  return new_record


print(transform_record("Sabrina Green,802-867-5309,System Administrator")) 
# Sabrina Green,+1-802-867-5309,System Administrator


print(transform_record("Eli Jones,684-3481127,IT specialist")) 
# Eli Jones,+1-684-3481127,IT specialist


print(transform_record("Melody Daniels,846-687-7436,Programmer")) 
# Melody Daniels,+1-846-687-7436,Programmer


print(transform_record("Charlie Rivera,698-746-3357,Web Developer")) 
# Charlie Rivera,+1-698-746-3357,Web Developer

Awesome! Your knowledge of regular expressions will come in handy when you do even more work with files!

2. The multi_vowel_words function returns all words with 3 or more consecutive vowels (a, e, i, o, u). Fill in the regular expression to do that.

7j6jeCvs5fA8rg9G51Mu2a0ncsdc065DLwNPJBQhkfiF8LZbt2dyoB2qx61LPDs3XU HP zoISOwX8 0dNNR37WHUqAjjAcnbmxFtrmhRXaL76QI3jeNXSigLuIQl7fQCWpFCt8NVzYhvePU iBzQ

  • Answer
import re
def multi_vowel_words(text):
    pattern = r'\b\w*[aeiouAEIOU]{3,}\w*\b'
    result = re.findall(pattern, text)
    return result


print(multi_vowel_words("Life is beautiful")) 
# ['beautiful']


print(multi_vowel_words("Obviously, the queen is courageous and gracious.")) 
# ['Obviously', 'queen', 'courageous', 'gracious']


print(multi_vowel_words("The rambunctious children had to sit quietly and await their delicious dinner.")) 
# ['rambunctious', 'quietly', 'delicious']


print(multi_vowel_words("The order of a data queue is First In First Out (FIFO)")) 
# ['queue']


print(multi_vowel_words("Hello world!")) 
# []

Woohoo! Seriously, your work is glorious, notorious, and victorious!

3. When capturing regex groups, what datatype does the groups method return?

  • A string
  • A tuple (CORRECT)
  • A list
  • A float

Nice job! Because a tuple is returned, we can access each index individually.

4. The transform_comments function converts comments in a Python script into those usable by a C compiler. This means looking for text that begins with a hash mark (#) and replacing it with double slashes (//), which is the C single-line comment indicator. For the purpose of this exercise, we’ll ignore the possibility of a hash mark embedded inside of a Python command, and assume that it’s only used to indicate a comment. We also want to treat repetitive hash marks (##), (###), etc., as a single comment indicator, to be replaced with just (//) and not (#//) or (//#). Fill in the parameters of the substitution method to complete this function:

pcW07d16hdDshM6mwjlkgWfN5iT7 dfjhCuCS0s8N4T4A2AbYcTjnqkBxjC3W2XbC70XEOnARbpyAZxCTG5NfxrCRb6YklLX zhYTfAaQYeNsglM2C9hvlb1PV7Q4XZQF6AZ8IGkON1vKLP70 SlEw

  • Answer
import re
def transform_comments(line_of_code):
  result = re.sub((r"\s*#+\s*"),"// ", line_of_code)
  return result


print(transform_comments("### Start of program")) 
# Should be "// Start of program"
print(transform_comments("  number = 0   ## Initialize the variable")) 
# Should be "  number = 0   // Initialize the variable"
print(transform_comments("  number += 1   # Increment the variable")) 
# Should be "  number += 1   // Increment the variable"
print(transform_comments("  return(number)")) 
# Should be "  return(number)"

Excellent! Now you can convert your comments into other programming languages, you just need to convert the code to go with it!

5. The convert_phone_number function checks for a U.S. phone number format: XXX-XXX-XXXX (3 digits followed by a dash, 3 more digits followed by a dash, and 4 digits), and converts it to a more formal format that looks like this: (XXX) XXX-XXXX. Fill in the regular expression to complete this function.

WqN0OTrLintnIbx66LVs9sm84Bkv1jgSaOAOmQU2ZPBCpacD3h4qGXg0xveRbkFVazIkhC76CsZFUCPVCUgtTZhI7faMSGFx HZtAFPgAyDIPvi8EHFS2pG gVYPcifcsEAeHgTU

  • Answer
import re
def convert_phone_number(phone):
  result = re.sub(r"\b(\d{3})-(\d{3})-(\d{4})\b",r"(\1) \2-\3", phone)
  return result


print(convert_phone_number("My number is 212-345-9999.")) # My number is (212) 345-9999.
print(convert_phone_number("Please call 888-555-1234")) # Please call (888) 555-1234
print(convert_phone_number("123-123-12345")) # 123-123-12345
print(convert_phone_number("Phone number of Buckingham Palace is +44 303 123 7300")) # Phone number of Buckingham Palace is +44 303 123 7300

Well done! You’ve captured the right groups to identify what we’re looking for, and nothing else!

6. Fix the regular expression used in the rearrange_name function so that it can match middle names, middle initials, as well as double surnames.

aVDRbqukvGf0XLH932bT9xxii2JjDSfTQpWX6JvaJEw2kcxKfGHvXanEt1MrvfPYkiG3O9ZWaU sbtBCC6hBBfkScgQ YrVEkXd23rrGrIHTCLvL4zaGdUKxz thyhzOWJRUyiuTddI 3RJZFoV5GQ

  • Answer
import re
def rearrange_name(name):
    result = re.search(r"^(\w+),\s*(\w+(?:\s+[A-Z]\.)?)$", name)
    if result is None:
        return name
    return "{} {}".format(result[2], result[1])


name=rearrange_name("Kennedy, John F.")
print(name)

Nice work! You’re doing well using regular expressions to capture groups.

7. The long_words function returns all words that are at least 7 characters. Fill in the regular expression to complete this function.

9v3M 75ImJAb R7zV

  • Answer
import re
def long_words(text):
    pattern = r'\b\w{7,}\b'
    result = re.findall(pattern, text)
    return result


print(long_words("I like to drink coffee in the morning.")) # ['morning']
print(long_words("I also have a taste for hot chocolate in the afternoon.")) # ['chocolate', 'afternoon']
print(long_words("I never drink tea late at night.")) # []

Nice job! Your regular expressions are getting more and more sophisticated!

8. Add to the regular expression used in the extract_pid function, to return the uppercase message in parenthesis, after the process id.

OZ8 yP IiewlXgVlsMQz1vqisbo5dIeCK9i 0j77PBTvw18XewSwftUQyEKq g EqQC0oTL6UqYU0WfL4s7zN WHIPuy5 IwqvP1YUUUnkO2IlRmibaWtwna1

  • Answer
import re
def extract_pid(log_line):
    regex = r"\[(\d+)\]:\s+([A-Z]+)"
    result = re.search(regex, log_line)
    if result is None:
        return None
    return "{} ({})".format(result.group(1), result.group(2))


print(extract_pid("July 31 07:51:48 mycomputer bad_process[12345]: ERROR Performing package upgrade")) # 12345 (ERROR)
print(extract_pid("99 elephants in a [cage]")) # None
print(extract_pid("A string that also has numbers [34567] but no uppercase message")) # None
print(extract_pid("July 31 08:08:08 mycomputer new_process[67890]: RUNNING Performing backup")) # 67890 (RUNNING)

You nailed it! You’re using the tools you’ve learned in the previous lessons, and it shows!

9. We want to split a piece of text by either the word “a” or “the”, as implemented in the following code. What is the resulting split list?

re.split(r"the|a", "One sentence. Another one? And the last one!")

re.split(r"the|a", "One sentence. Another one? And the last one!")

  • [‘One sentence. Another one? And ‘, ‘ last one!’]
  • [‘One sentence. Another one? And ‘, ‘the’, ‘ last one!’]
  • [‘One sentence. Ano’, ‘r one? And ‘, ‘ l’, ‘st one!’] (CORRECT)
  • [‘One sentence. Ano’, ‘the’, ‘r one? And ‘, ‘the’, ‘ l’, ‘a’, ‘st one!’]

Well done! This regular expression uses “the” and “a” as delimiters, no matter where they are in the text, even in the middle of other words like “Another” and “last”.

WRITING FILE PATHS

1. There are two types of file paths: relative and absolute. What is an absolute file path?

  • A file path that spells out the exact location of the file. (CORRECT)
  • A file path that defaults to the specific directory where the Python command was initially run.
  • A file path used only for calling libraries.
  • A file path used only when the files needed are on the local computer.

Feedback: Correct! An absolute file path is written by drive name, then directory, then file name.

2. As a Python programmer, you will probably choose to use relative file paths more often than absolute file paths. What are the advantages of relative file paths over absolute file paths? Select all that apply. 

  • Absolute file paths are used to read and write files by the file name alone.
  • Python scripts can run only using relative file paths.
  • Relative file paths don’t change by operating system. (CORRECT)
  • With relative file paths, it doesn’t matter that the drive names change from computer to computer. (CORRECT)

Feedback: Correct! Absolute file paths do change by operating system, though, and scripts that rely on file paths for one operating system may not work on another.

Feedback: Correct! With absolute file paths, this can cause a problem if you don’t have all of the drive names recorded.

3. When writing file paths in Python, it’s a best practice to use only forward slashes (/ )to separate the directories. Why is that? 

  • Because back-slashes are used only for the root.
  • Because back-slashes can be used only in relative file paths. 
  • Because Windows uses back-slashes (\) in file paths.Because back-slashes are a special character in Python. (CORRECT)

Feedback: Correct! If you use a back-slash in a file path in Python, you have to use it again to “escape every instance.”

4. Many Python programmers use the command os.path to wrap directories. What is that specific command designed to do? 

  • Work around platform differences between Windows and Mac/Linux. (CORRECT)
  • Use the current directory. 
  • List files and directories to find the file path you need using this code.
  • To spell out the exact location of the file.

Feedback: Correct! The command os.path allows programmers to work around the file structure differences between different platforms.

5. You can call a file with a relative file path using the file name, provided you also use the CWD: What does CWD stand for?

  • Content working directory
  • Command working directory 
  • Current working directory (CORRECT)
  • Current web directory

Feedback: Correct! You need the current working directory to call a file using a relative file path.

WORK WITH REGULAR EXPRESSIONS

1. Which of the following tasks can be accomplished using regular expressions? Select all that apply.

  • Sorting a list of numbers
  • Extracting email addresses from a text (CORRECT)
  • Replacing a specific pattern in a text (CORRECT)
  • Finding all occurrences of a word in a text (CORRECT)
  • Generating random numbers

Correct

2. What is a characteristic of a CSV file? 

  • Data in each row is separated by a special character. (CORRECT)
  • CSV files can contain only numeric data.
  • It cannot be read by a text editor.
  • Each line represents a different column of data.

Correct

3. What does the variable old_domain_pattern store in the lab’s first set of codes?

  • It stores the address after replacing the old domain.
  • It stores a regular expression pattern to identify the old domain. (CORRECT)
  • It stores the entire email address.
  • It stores the new domain name.

Correct

4. Which of the following statements are correct regarding the re.sub() function? Select all that apply.

  • The re.sub() function allows the use of capturing groups to reuse matched patterns in the replacement text. (CORRECT)
  • The re.sub() function directly modifies the original input text.
  • The replacement string in re.sub() can contain backreferences to captured groups. (CORRECT)The re.sub() function takes four parameters: the pattern, the replacement string, the input text, and an optional flags parameter. (CORRECT)

Correct

5. You have been tasked with initializing two lists: old_domain_email_list and new_domain_email_list. The goal is to populate old_domain_email_list with email addresses that contain the old domain name and meet the criteria defined in the contains_domain function. How would you achieve this in Python based on the content in your lab?

  • Manually inspect each email address in the user_email_list and append those containing the old domain to old_domain_email_list.
  • Use a for loop to iterate over the user_email_list. For each email address, call the contains_domain function to check if it matches the old domain. 
  • Initialize old_domain_email_list with an empty list. Use a regular expression to find all email addresses in the user_email_list that match the old domain pattern and add them to old_domain_email_list. (CORRECT)
  • Use a list comprehension to extract all email addresses from the user_email_list that match the old domain pattern and store them in old_domain_email_list.

Correct

6. What is a key benefit of using Python for creating reports and employing regular expressions?

  • To develop virtual reality applications.
  • To optimize website performance.
  • To enhance cybersecurity measures.To automate repetitive tasks and data analysis. (CORRECT)

Correct

7. What are headers in the context of a CSV file?

  • Headers are special characters or symbols used to separate data items in a CSV file.
  • Headers refer to the metadata that describes the source and author of the CSV file.
  • Headers are additional rows added at the end of a CSV file to summarize the data.Headers are the first row in a CSV file, typically containing the names of each column. (CORRECT)

Correct

8. In the process of updating email domains in a CSV file using Python, how do the contains_domain and replace_domain functions work together?

  • contains_domain uses a regular expression to identify emails with a specific domain, and replace_domain replaces these domains with new ones. (CORRECT)
  • contains_domain encrypts the email addresses, and replace_domain decrypts them back to their original form.
  • Contains_domain sorts the email addresses, and replace_domain reverses their order.
  • contains_domain deletes email addresses with outdated domains, and replace_domain creates new email addresses from scratch.

Correct

9. What is a regular expression? 

  • A method of encrypting data
  • A function in Python for handling exceptions
  • A sequence of characters that forms a search pattern (CORRECT)
  • A type of data structure in Python

Correct

10. Which method from the csv module is used to write rows of data into a CSV file? 

  • csv.write_rows()
  • csv.write()
  • csv.writerows() (CORRECT)
  • csv.add_rows()

Correct

11. In the provided code snippet, what is the purpose of the replace_domain function?

def replace_domain(address, old_domain, new_domain):
 old_domain_pattern = r'' + old_domain + '$'
 address = re.sub(old_domain_pattern, new_domain, address)
 return address
  • To remove any domain from the email address
  • To create a new email address with the old domain replaced by the new one (CORRECT)
  • To extract the username part of an email address
  • To validate the format of an email address

Correct

12. In regular expressions, what does the re.split() function do? 

  • Splits a string into a list of substrings based on a regular expression pattern (CORRECT)
  • Checks if a string contains a specific substring or not
  • Concatenates multiple strings into one string
  • Removes all occurrences of a specified substring from a string

Correct

13. What is the significance of the variables csv_file_location and report_file in the code snippet from the lab?

csv_file_location = '<csv-file-location>'
 report_file = '<data-directory>' + '/updated_user_emails.csv
  • They store the usernames for accessing a remote server.
  • They define regular expressions for email domain validation.
  • They store file paths for input and output CSV files. (CORRECT)
  • They contain the old and new domain names for email address replacement.

Correct

14. What is the next step after declaring the output file variable report_file at the beginning of the script?

  • Close the CSV file
  • Write the list to the output file (CORRECT)
  • Initialize the user_email_list
  • Define headers for the output file

Correct

Questions15. What is the second step in the process of replacing old domain names with new ones in a CSV file using Python?

  • Replacing old domain names with new ones in the email addresses
  • Creating a list of old domain email addresses
  • The code you provided reads data from a CSV file. (CORRECT)
  • Defining variables for the old and new domain names

Correct

16. Which function is used to match a regular expression pattern in Python?

  • re.search()
  • re.match()
  • re.findall()All of the above (CORRECT)

Correct

17. Why is it important to write the list to an output file in a Python script, as specified by the variable report_file?

  • Writing to an output file is a required step in Python programming for all data manipulation tasks to ensure memory efficiency.
  • The output file serves as a backup for the original data, preventing any loss of information in case of script errors.
  • Writing the updated list to an output file provides a permanent record of the changes made, allowing for data preservation and further analysis. (CORRECT)
  • Writing to an output file allows for temporary storage of data, which is essential for debugging the script.

Correct

18. Which Python libraries or modules are commonly employed to perform updates to domain names to a new specified domain and saving all the modified domain names to a separate file?

  • re (regular expressions) and requests
  • os and sys
  • pandas and numpy
  • re (regular expressions) and open() function (CORRECT)

Correct

19. Which Python function would you use to open a CSV file? 

  • open() (CORRECT)
  • csv.reader()
  • file.open()
  • csv.open()

Correct

20. In the Python script for processing user_emails.csv, what is the purpose of the contains_domain function?

  • To encrypt email addresses in the CSV file for security
  • To check if an email address belongs to a specific domain, using Regular Expressions (RegEx) (CORRECT)
  • To add a new domain to each email address in the CSV file
  • To count the number of email addresses in the CSV file

Correct

21. What is the purpose of initializing the old_domain_email_list in the code from the lab?

  • To perform a substitution operation on email addresses
  • To store email addresses with the new domain
  • To store email addresses with the old domain that match the regex pattern (CORRECT)
  • To store all email addresses from user_email_list

Correct

22. What is the purpose of the replace_domain function in the process of replacing old domain names with new ones in a CSV file using Python?

  • To iterate over a list of email addresses
  • To replace the old domain with the new domain in an email address (CORRECT)
  • To read data from a CSV file
  • To write the updated list to a CSV file

Correct

23. You have been tasked with replacing old domain names with new ones in a CSV file using Python, based on the lab. What is the correct sequence of steps to accomplish this task?

  • Define headers for the output file, close the CSV file, initialize user_email_list, and write the updated user_data_list to the output file.
  • Close the CSV file, define headers for the output file, initialize user_email_list, and write the updated user_data_list to the output file.
  • Initialize user_email_list, define headers for the output file, write the updated user_data_list to the output file, and close the CSV file. (CORRECT)
  • Initialize user_email_list, close the CSV file, define headers for the output file, and write the updated user_data_list to the output file.

Correct

24. Why is it important to define headers for the output file when replacing old domain names with new ones in a CSV file using Python, as described in the lab?

  • To save memory and improve script performance
  • To ensure that the email addresses are correctly replaced
  • To identify the column that contains email addresses in the output file (CORRECT)
  • To prevent errors when opening the output file

Correct

CONCLUSION – Regular Expressions

In conclusion, this module has provided a thorough exploration of regular expressions, ensuring a solid understanding of their conceptual framework and practical applications. From the fundamentals, including wildcards, repetition qualifiers, and escape characters, to the advanced aspects, with a specific focus on repetition qualifiers, you have gained a comprehensive grasp of regex principles.

Engaging in exercises involving capturing groups and extracting PIDs has reinforced your practical skills. To support your ongoing learning, a detailed study guide has been provided, offering a valuable and comprehensive reference for mastering the intricacies of regular expressions. Armed with this knowledge, you are well-prepared to apply regular expressions effectively in your coding endeavors.