COURSE 4 – TROUBLESHOOTING AND DEBUGGING TECHNIQUES

Module 2: Slowness

GOOGLE IT AUTOMATION WITH PYTHON PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

Last updated:

INTRODUCTION – Slowness

In this module, you’ll learn about what factors can cause a machine or program to act slowly. You’ll dive into ways of addressing slowness by identifying the bottleneck that might be causing the slowness. You’ll learn about tools to identify which resources are being exhausted, including iotop, iftop, and activity monitor in MacOS. Next, you’ll learn how computers use resources, and understand the differences between CPU, RAM, and cache in order to help you find the possible causes for slowness in our machines or scripts. Next up, you’ll learn how to write efficient code, then explore profilers to help you identify where your code is spending most of its time. Next, you’ll dive into data structures and understand which ones are right for you to use. These include lists, tuples, dictionaries, sets, and expensive loops. Then, you’ll dive into complex slowness problems and how utilizing concurrency and adding a caching service can improve the execution of your code. Finally, you’ll understand how using threads can make the execution of your code much quicker.

Learning Objectives

  • Understand what slowness is and utilize tools to identify the bottleneck causing the issue
  • Utilize tools like iotop and iftop to identify exhausted resources
  • Understand the different computer components and how they can contribute to slowness
  • Understand how to write efficient code, and utilize the use of data structures and loops to help your code run efficiently
  • Utilize concurrency, caching services, and threads to improve the execution of your code

PRACTICE QUIZ: UNDERSTANDING SLOWNESS

1. Which of the following will an application spend the longest time retrieving data from?

  • CPU L2 cache
  • RAM
  • Disk
  • The network (CORRECT)

Right on! An application will take the longest time trying to retrieve data from the network.

2. Which tool can you use to verify reports of ‘slowness’ for web pages served by a web server you manage?

  • The top tool
  • The ab tool (CORRECT)
  • The nice tool
  • The pidof tool

Great work! The ab tool is an Apache Benchmark tool used to figure out how slow a web server is based on average timing of requests.

3. If our computer running Microsoft Windows is running slow, what performance monitoring tools can we use to analyze our system resource usage to identify the bottleneck? (Check all that apply)

  • Performance Monitor (CORRECT)
  • Resource Monitor (CORRECT)
  • Activity Monitor
  • top

Excellent! Performance Monitor is a system monitoring program that provides basic CPU and memory resource measurements in Windows.

Nice job! Resource Monitor is an advanced resource monitoring utility that provides data on hardware and software resources in real time.

4. Which of the following programs is likely to run faster and more efficiently, with the least slowdown?

  • A program with a cache stored on a hard drive
  • A program small enough to fit in RAM (CORRECT)
  • A program that reads files from an optical disc
  • A program that retrieves most of its data from the Internet

Nice work! Since RAM access is faster than accessing a disk or network, a program that can fit in RAM will run faster.

5. What might cause a single application to slow down an entire system? (Check all that apply)

  • A memory leak (CORRECT)
  • The application relies on a slow network connection
  • Handling files that have grown too large (CORRECT)
  • Hardware faults

Woohoo! Memory leaks happen when an application doesn’t release memory when it is supposed to.

Awesome! If files generated by the application have grown overly large, slowdown will occur if the application needs to store a copy of the file in RAM in order to use it.

6. When addressing slowness, what do you need to identify?

  • The bottleneck (CORRECT)   
  • The device  
  • The script  
  • The system  

Woohoo! The bottleneck could be the CPU time, or time spent reading data from disk.  

7. After retrieving data from the network, how can an application access that same data quicker next time?

  • Use the swap
  • Create a cache (CORRECT)
  • Use memory leak
  • Store in RAM

You nailed it! A cache stores data in a form that’s faster to access than its original form.

8. A computer becomes sluggish after a few days, and the problem goes away after a reboot. Which of the following is the possible cause?

  • Files are growing too large. 
  • A program is keeping some state while running. (CORRECT)
  • Files are being read from the network. 
  • Hard drive failure.

Awesome! A program keeping a state without any change can slow down a computer up until it is rebooted. 

PRACTICE QUIZ: SLOW CODE

1. Which of the following is NOT considered an expensive operation?

  • Parsing a file
  • Downloading data over the network
  • Going through a list
  • Using a dictionary (CORRECT)

Awesome! Using a dictionary is faster to look up elements than going through a list.

2. Which of the following may be the most expensive to carry out in most automation tasks in a script?

  • Loops (CORRECT)
  • Lists
  • Vector
  • Hash

Great work! Loops that run indefinitely, and include subtasks to complete before moving on can be very expensive for most automation tasks.

3. Which of the following statements represents the most sound advice when writing scripts?

  • Aim for every speed advantage you can get in your code
  • Use expensive operations often
  • Start by writing clear code, then speed it up only if necessary (CORRECT)
  • Use loops as often as possible

Awesome! If we don’t notice any slowdown, then there’s little point trying to speed it up.

4. In Python, what is a data structure that stores multiple pieces of data, in order, which can be changed later?

  • A hash
  • Dictionaries
  • Lists (CORRECT)
  • Tuples

Right on! Lists are efficient, and if we are either iterating through the entire list or are accessing elements by their position, lists are the way to go.

5. What command, keyword, module, or tool can be used to measure the amount of time it takes for an operation or program to execute? (Check all that apply)

  • time (CORRECT)
  • kcachegrind (CORRECT)
  • cProfile (CORRECT)
  • break

Excellent! We can precede the name of our commands and scripts with the “time” shell builtin and the shell will output execution time statistics when they complete.

Nice work! The kcachegrind tool is used for profile data visualization that, if we can insert some code into the program, can tell us how long execution of each function takes.

Great job! cProfile provides deterministic profiling of Python programs, including how often and for how long various parts of the program executed.

6. Which of the following has values associated with keys in Python?

  • A hash
  • A dictionary (CORRECT)
  • A HashMap
  • An Unordered Map

You nailed it! Python uses a dictionary to store values, each with a specific key.

7. Your Python script searches a directory, and runs other tasks in a single loop function for 100s of computers on the network. Which action will make the script the least expensive?  

  • Read the directory once (CORRECT)
  • Loop the total number of computers  
  • Service only half of the computers  
  • Use more memory 

Awesome! Reading the directory once before the loop will make the script less expensive to run.  

8. Your script calculates the average number of active user sessions during business hours in a seven-day period. How often should a local cache be created to give a good enough average without updating too often?  

  • Once a week
  • Once a day (CORRECT)
  • Once a month  
  • Once every 8 hours 

Woohoo! A local cache for every day can be accessed quickly, and processed for a seven-day average calculation.    

9. You use the time command to determine how long a script runs to complete its various tasks. Which output value will show the time spent doing operations in the user space?  

  • Real  
  • Wall-clock  
  • Sys  
  • User (CORRECT)

You nailed it! The user value is the time spent doing operations in the user space.  

PRACTICE QUIZ: WHEN SLOWNESS PROBLEMS GET COMPLEX

1. Which of the following can cache database queries in memory for faster processing of automated tasks?

  • Threading
  • Varnish
  • Memcached (CORRECT)
  • SQLite

You nailed it! Memchached is a caching service that keeps most commonly accessed database queries in RAM.

2. What module specifies parts of a code to run in separate asynchronous events?

  • Threading
  • Futures
  • Asyncio (CORRECT)
  • Concurrent

Awesome! Asyncio is a module that lets you specify parts of the code to run as separate asynchronous events.

3. Which of the following allows our program to run multiple instructions in parallel?

  • Threading (CORRECT)
  • Swap space
  • Memory addressing
  • Dual SSD

Woohoo! Threading allows a process to split itself into parallel tasks.

4. What is the name of the field of study in computer science that concerns itself with writing programs and operations that run in parallel efficiently?

  • Memory management
  • Concurrency (CORRECT)
  • Threading
  • Performance analysis

Right on!  Concurrency in computer science is the ability of different sections or units of a program, algorithm, or problem to be executed out of order or in partial order, without impacting the final result.

5. What would we call a program that often leaves our CPU with little to do as it waits on data from a local disk and the Internet?

  • Memory-bound
  • CPU-bound
  • User-bound
  • I/O bound (CORRECT)

Right on! If our program mainly finds itself waiting on local disks or the network, it is I/O bound.

6. A script is _____ if you are running operations in parallel using all available CPU time.  

  • I/O bound  
  • Threading  
  • CPU bound (CORRECT)
  • Asyncio

Right on! A script is CPU bound if you’re running operations in parallel using all available CPU time.

7. You’re creating a simple script that runs a query on a list of product names of a very small business, and initiates automated tasks based on those queries. Which of the following would you use to store product names?  

  • SQLite  
  • Microsoft SQL Server  
  • Memcached 
  • CSV file (CORRECT)

Nice job! A simple CSV file is enough to store a list of product names.  

8. A company has a single web server hosting a website that also interacts with an external database server. The web server is processing requests very slowly. Checking the web server, you found the disk I/O has high latency. Where is the cause of the slow website requests most likely originating from?  

  • Local disk (CORRECT)
  • Remote database  
  • Slow Internet  
  • Database index

You got it! The local disk I/O latency is causing the application to wait too long for data from disk.  

9. Which module makes it possible to run operations in a script in parallel that makes better use of CPU processing time?  

  • Executor  
  • Futures (CORRECT)
  • Varnish  
  • Concurrency  

Woohoo! The futures module makes it possible to run operations in parallel using different executors.  

PERFORMANCE TUNING IN PYTHON SCRIPTS

1. Which of the following best describes a CPU-bound task?

  • A task that frequently waits for network responses.
  • A task that consistently requires more memory than is available.
  • A task that often waits for I/O operations to complete.A task that primarily utilizes only one of the available CPU cores, even when others are free. (CORRECT)

Correct

2. Which of the following best describes rsync (remote sync)?

  • rsync is a system command that enables administrative control over user access to files within networked computers.
  • rsync is a tool that automates system backups by periodically duplicating all files without checking for changes.
  • rsync is a utility for efficiently transferring and synchronizing files between a computer and an external hard drive and across networked computers by comparing the modification time and size of files. (CORRECT)
  • rsync is a networking tool used for monitoring data usage and bandwidth in real-time across multiple computer systems.

Correct

3. In the lab, you employed multiprocessing to reduce backup time. Why was multiprocessing the right choice in this example?

  • It added timestamps to each operation, which enabled better tracking
  • The task was CPU-bound, and multiprocessing leveraged unused CPU cores to run the script significantly faster (CORRECT)
  • It leveraged a faster internet connection to fix the network bottleneck
  • It reduced the number of variables used to speed up the process

Correct

4. True or false: psutil is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python. 

  • True (CORRECT)
  • False

Correct

5. In the assessment, the result of the psutil.cpu_percent() function is “.6”. What does this mean?

  • The CPU is being utilized at 60% of its capacity.
  • The system has 0.6 cores available for processing tasks.
  • The CPU utilization is 0.6%, indicating very low CPU usage. (CORRECT)
  • The function has encountered an error, and 0.6 is the error code.

Correct

6. What is the correct order of arguments when using the rsync command?

  •  [Destination] [Options] [Source-Files-Dir]
  • [Source-Files-Dir] [Destination] [Options]
  • [Options] [Destination] [Source-Files-Dir]
  • [Options] [Source-Files-Dir] [Destination] (CORRECT)

Correct

7. True or false: In the assessment, the multisync.py script is designed to backup data sequentially, one task after another.

  • True
  • False (CORRECT)

Correct

8. True or false: A script that often waits for I/O operations to complete could be called a CPU-bound task.

  • True
  • False (CORRECT)

Correct

9. How does rsync (remote sync) primarily optimize data transfer?

  • rsync prioritizes files based on their importance and transfers critical files first.
  • rsync duplicates all files every time to ensure no data is missed during transfer.
  • rsync uses advanced compression algorithms to reduce the size of files before transferring.
  • rsync transfers and synchronizes files by comparing the modification time and size, ensuring only changed data is transferred. (CORRECT)

Correct

10. In the Qwiklab, what did you use the psutil.disk_io_counters() function to do?

  • To estimate file space usage on the disk
  • To configure network interface parameters
  • To monitor real-time network traffic
  • To retrieve disk I/O statistics (CORRECT)

Correct

11. In this example, how was the backup script improved to reduce the backup time significantly?

  • By using more verbose logging
  • By increasing the network bandwidth
  • By utilizing multiprocessing (CORRECT)
  • By compressing the files before transferring

Correct

12. In the activity, what was the psutil python3 module used for?

  • Monitoring network bandwidth (CORRECT)
  • Analyzing GPU performance
  • Checking power consumption
  • Checking CPU usage (CORRECT)

Correct

13. IIn the multisync.py script, what is the role of the map method of the Pool object?

  • To distribute the tasks evenly across available CPUs (CORRECT)
  • To map the output of one task to the input of another
  • To map all tasks to a single processor
  • To create a mapping of task dependencies

Correct

14. Which of the following are options for the rsync command? Select all that apply.

  • -p
  • -z (CORRECT)
  • -v (CORRECT)
  • -a (CORRECT)

Correct

15. True or false: In this example, the efficiency of the script was improved by compressing the files before transferring.

  • False (CORRECT)
  • True

Correct

16. True or false: In order to check how much your program utilizes CPU using psutil.cpu_percent(), you first need to install the pip3 which is a Python package installer. 

  • True
  • False (CORRECT)

Correct

17. What makes rsync (remote sync) distinct from other file transfer methods?

  • rsync encrypts all files before transfer, ensuring maximum security.
  • rsync increases the speed of the internet connection during file transfer.
  • rsync requires manual selection of each file for transfer.
  • rsync uses the delta transfer algorithm, meaning it transfers only the differences between source and destination files. (CORRECT)

Correct

18. Which command did you use for checking disk I/O?

  • df -h
  • netstat
  • psutil.disk_io_counters() (CORRECT)
  • diskcheck

Correct

19. Which of the following performance metrics did you explore to identify system limitations? Select all that apply.

  • Checking power consumption
  • Monitoring network bandwidth (CORRECT)
  • Checking CPU usage (CORRECT)
  • Analyzing GPU performance

Correct

20. Why is it necessary to grant executable permission to the multisync.py script before running it?

  • To enable the script to access system files
  • To allow the script to use network resources
  • To permit the script to modify its own code
  • To allow the operating system to execute the script as a program (CORRECT)

Correct

21. Which of the following options for the rsync command provides a verbose output?

  • -q
  • -z
  • -b
  • -v (CORRECT)

Correct

22. In the rsync command syntax, what does the [Destination] argument represent?

  • The directory or file where the data will be synchronized to (CORRECT)
  • The options or flags that modify the behavior of the command
  • The source directory or file that needs to be synchronized
  • The name of the command itself

Correct

23. In the assessment, 904123904 bytes were written to disk. You found this result using the ______________ function.

  • psutil.net_io_counters()
  • psutil.memory_info()
  • psutil.disk_io_counters() (CORRECT)
  • psutil.cpu_percent()

Correct

24. What is the following Python script used for?

import psutil
psutil.cpu_percent()
  • Memory usage
  • System uptime
  • CPU utilization (CORRECT)
  • Network performance 

Correct

25. If you want to synchronize files from the directory /home/user/docs to /backup/docs with verbose output, which of the following rsync commands would you use?

  • rsync /home/user/docs /backup/docs -v
  • rsync -v /backup/docs /home/user/docs
  • rsync -v /home/user/docs /backup/docs (CORRECT)
  • rsync /backup/docs /home/user/docs

Correct

26. What is the purpose of the Pool class in the multiprocessing Python module as used in the multisync.py script?

  • To create a single process for each task
  • To manage a pool of worker processes (CORRECT)
  • To synchronize execution of processes
  • To limit the CPU usage of the script

Correct

27. Which of the following statements best describes the primary purpose of the multiprocessing module in Python?

  • It allows for the execution of multiple threads within a single process.
  • It provides support for parallel execution of code using multiple CPU cores. (CORRECT)
  • It manages multiple Python interpreters in the same program.
  • It facilitates asynchronous I/O operations without using threads or processes.

Correct

CONCLUSION – Slowness

In conclusion, this module has provided a comprehensive understanding of the factors that contribute to slow performance in machines or programs. Through an exploration of identifying bottlenecks and exhaustion of resources, including tools like iotop, iftop, and activity monitor, you’ve gained insights into diagnosing performance issues effectively. Additionally, you’ve learned about the utilization of CPU, RAM, and cache, aiding in the identification of potential causes of slowness.

Furthermore, you’ve delved into writing efficient code and utilizing profilers to pinpoint areas of improvement. Understanding various data structures such as lists, tuples, dictionaries, and sets has equipped you with the knowledge to choose the most suitable for your needs and avoid expensive loops. Lastly, the module has addressed complex slowness problems and provided strategies like concurrency and caching services to enhance code execution efficiency. With an understanding of how threads can expedite code execution, you’re well-prepared to optimize performance in your projects effectively.