COURSE 4 – TROUBLESHOOTING AND DEBUGGING TECHNIQUES

Module 3: Crashing Programs

GOOGLE IT AUTOMATION WITH PYTHON PROFESSIONAL CERTIFICATE

Complete Coursera Study Guide

INTRODUCTION – Crashing Programs

In this module, you’ll get introduced to the age old question, “Why has my program crashed?” You’ll learn how to troubleshoot system crashes and application crashes, what tools can be used to help identify the cause of the crash, and what log files to look at in order to find what might have gone wrong. Next, you’ll dive into investigating why code crashes, and what you can do to prevent that from happening. Then, you’ll explore what happens when an unhandled error occurs and throws an exception. You’ll learn about several debugging techniques, which will help you identify these errors and exceptions. Finally, you’ll explore the concept of handling crashes and incidents at a much larger scale. You’ll delve into a scenario where a large eCommerce site will throw an error 20% of the time. Once that issue has been fixed, you’ll understand the importance of communication and documentation during these incidents, and how writing a post mortem can prevent issues from happening again.

Learning Objectives

  • Understand the difference between system and application crashes
  • Utilize skills in debugging and log reading to identify these crashes
  • Understand the different types of code crashes and be able to address invalid memory errors
  • Utilize techniques, like printf debugging, to troubleshoot and resolve unhandled errors and exceptions
  • Understand how communication and documentation during an outage or error is critical
  • Understand what a postmortem is and what should be included in one

PRACTICE QUIZ: WHY PROGRAMS CRASH

1. When using Event Viewer on a Windows system, what is the best way to quickly access specific types of logs?

  • Export logs
  • Create a custom view (CORRECT)
  • Click on System Reports
  • Run the head command

Nailed it! The Create Custom View action is used to filter through logs based on certain criteria.

2. An employee runs an application on a shared office computer, and it crashes. This does not happen to other users on the same computer. After reviewing the application logs, you find that the employee didn’t have access to the application. What log error helped you reach this conclusion?

  •  “No such file or directory”
  • “Connection refused”
  • “Permission denied” (CORRECT)
  • “Application terminated”

Keep it up! In this case, the “Permission denied” error means that the user didn’t have access to the application executable in order to run it.

3. What tool can we use to check the health of our RAM?

  • Event Viewer
  • S.M.A.R.T. tools
  • memtest86 (CORRECT)
  • Process Monitor

Awesome! memtest86 and memtest86+ are memory analysis software programs designed to test and stress test the random access memory of an x86 architecture system for errors, by writing test patterns to most memory addresses, then reading data back and checking for errors.

4. You’ve just finished helping a user work around an issue in an application. What important but easy-to-forget step should we remember to do next?

  • Fix the code
  • Report the bug to the developers (CORRECT)
  • Reinstall the program
  • Change the user’s password

Right on! If there is a repeatable error present in a program, it is proper etiquette to report the bug in detail to the developer.

5. A user is experiencing strange behavior from their computer. It is running slow and lagging, and having momentary freeze-ups that it does not usually have. The problem seems to be system-wide and not restricted to a particular application. What is the first thing to ask the user as to whether they have tried it?

  • Adding more RAM
  • Reinstalling Windows
  • Identified the bottleneck with a resource monitor (CORRECT)
  • Upgrade their HDD to an SSD

Woohoo! The first step is identifying the root cause of the problem. Resource monitors such as Activity Monitor (MacOS), top (Linux and MacOS) or Resource Monitor (Windows) can help us identify whether our bottleneck is CPU-based or memory-based.

6. A user reported an application crashes on their computer. You log in and try to run the program and it crashes again. Which of the following steps would you perform next to reduce the scope of the problem?  

  • Check the health of the RAM  
  • Switch the hard drive into another computer  
  • Check the health of the hard drive  
  • Review application logs (CORRECT)

Awesome! Reviewing logs is the next best step to determine if logs reveal any reason for the crash.

7. Where should you look for application logs on a Windows system?  

  • The /var/log directory  
  • The .xsession-errors file  
  • The Console app  
  • The Event Viewer app (CORRECT)

Great job! The Event Viewer app contains logs on a Windows system.  

8. An application fails in random intervals after it was installed on a different operating system version. What can you do to work around the issue?  

  • Use a wrapper  
  • Use a container (CORRECT) 
  • Use a watchdog  
  • Use an XML format  

Nice work! A container allows the application to run in its own environment without interfering with the rest of the system.  

9. Where is a common location to view configuration files for a web application running on a Linux server?  

  • /etc/<app folder> (CORRECT)
  • /var/log/<app folder>  
  • /srv/<app folder>  
  • /<app folder> 

Right on! The /etc directory will contain the application folder that stores configuration files.  

PRACTICE QUIZ: CODE THAT CRASHES

1. Which of the following will let code run until a certain line of code is executed?

  • Breakpoints (CORRECT)
  • Watchpoints
  • Backtrace
  • Pointers

Way to go! Breakpoints let code run until a certain line of code is executed.

2. Which of the following is NOT likely to cause a segmentation fault?

  • Wild pointers
  • Reading past the end of an array
  • Stack overflow
  • RAM replacement (CORRECT)

Right on! Segmentation fault is not commonly caused by a new RAM card in the system.

3. A common error worth keeping in mind happens often when iterating through arrays or other collections, and is often fixed by changing the less than or equal sign in our for loop to be a strictly less than sign. What is this common error known as?

  • Segmentation fault
  • backtrace
  • The No such file or directory error
  • Off-by-one error (CORRECT)

Nice work! The Off-by-one bug, often abbreviated as OB1, frequently happens in computer programming when an iterative process iterates one time too many or too little.

4. A very common method of debugging is to add print statements to our code that display information, such as contents of variables, custom error statements, or return values of functions. What is this type of debugging called?

  • Backtracking
  • Log review
  • Printf debugging (CORRECT)
  • Assertion debugging

Excellent! Printf debugging originated in name with using the printf() command in C++ to display debug information, and the name stuck. This type of debugging is useful in all languages.

5. When a process crashes, the operating system may generate a file containing information about the state of the process in memory to help the developer debug the program later. What are these files called?

  • Log files
  • Core files (CORRECT)
  • Metadata file
  • Cache file

Right on! Core files (or core dump files) record an image and status of a running process, and can be used to determine the cause of a crash.

6. You are a software developer who has been asked to write a program for a banking company. Your manager suggests you use assert statements in your code. What is the purpose of using assert statements in code?

  • To determine the code’s runtime
  • To translate the code to a different language
  • To catch issues and debug your code during development (CORRECT)
  • To rewrite code that produces errors

That’s right! An assert statement is beneficial to developers as it helps determine if bugs are in your code and where they are located.

7. How does the print statement help programmers debug codes?

  •  *A: It produces the output of the code. (CORRECT)
  •  B: It fixes the errors of the code.
  •  C: It prints the details of the code.
  • It recommends the correct code.

Correct. The print statement sends messages or prints out the values to the output screen. If the code has errors, the command will produce the error statement as output.

8. Visual Studio Code or VS Code, popular among programmers, utilizes breakpoints. What is a breakpoint?

  • A: An open-source product from Microsoft
  • *B: A debugging technique (CORRECT)
  •  C: An integrated developer environment (IDE)
  •  D: A location where the code error occurs 

Correct. A breakpoint is a debugging technique where you set a stopping point on a specific line of code.

9. Imagine that you’re working on a new feature for a web application. As you’re writing the code, you realize that certain sections might produce runtime errors. Which Python mechanism allows you to handle runtime errors without crashing the program? 

  • Print debugging
  • Assert statements
  • Try and except blocks (CORRECT)
  • If-else conditions

That’s right! The try and except blocks in Python are specifically designed to catch and handle exceptions (runtime errors). Code that might produce an error is placed inside a try block. If an error occurs, instead of crashing the program, the code inside the except block is executed.

10. You’re a web developer for an e-commerce website, and you’re noticing an increase in unexpected behaviors and errors as the site’s user base grows. Why might you choose to implement the Python logging module in your e-commerce website over traditional print() statements?

  • The logging module can only display messages on the console, similar to the print() function.
  • The logging module allows categorization of log messages based on their severity, such as DEBUG, INFO, WARNING, ERROR, and CRITICAL. (CORRECT)
  • The logging module can only capture error messages.
  • The logging module requires a third-party library to be installed.

That’s right! One of the key features of the logging module is its ability to categorize log messages based on severity levels. This helps in filtering and prioritizing issues, making it easier to diagnose and address problems.

11. A team of software developers is excited to use an AI tool to help with writing and debugging some pieces of their code. Which of the following is true about using AI tools with code?

  • AI tools can provide answers to your questions in seconds. (CORRECT)
  • The answers provided by AI tools are always correct.
  • The AI tools have been used by developers for decades.
  • The AI tools have been through many development iterations and are considered as perfect tools.

That’s right! AI tools can provide you with feedback to your question in a matter of seconds.

12. Which of the following can assist in finding out if invalid operations are occurring in a program running on a Windows system?  

  • Valgrind  
  • Dr. Memory (CORRECT)
  • PBD files  
  • Segfaults 

You got it! Dr. Memory can assist in finding out if invalid operations are occurring in a program running on Windows or Linux.  

13. After getting acquainted with the program’s code, where might you start to fix a problem?  

  • Run through tests  
  • Read the comments  
  • Locate the affected function (CORRECT)
  • Create new tests  

Nicely done! Start working on the function that produced the error, and the function(s) that called it.

14. When debugging code, what command can you use to figure out how your program reached the failed state?  

  • gdb  
  • backtrace (CORRECT)
  • ulimit  
  • list  

Nice job! The backtrace command can be used to show a summary of the function calls that were used to the point where the failure occurs.   

15. When debugging in Python, what command can you use to run the program until it crashes with an error?

  • pdb3
  • next
  • continue (CORRECT)
  • KeyError

Awesome! Running the continue command after starting the pdb3 debugger will execute the program until it finishes or crashes.

PRACTICE QUIZ: HANDLING BIGGER INCIDENTS

1. Which of the following would be effective in resolving a large issue if it happens again in the future?

  • Incident controller
  • Postmortem (CORRECT)
  • Rollbacks
  • Load balancers

Keep it up! A postmortem is a detailed document of an issue which includes the root cause and remediation. It is effective on large, complex issues.

2. During peak hours, users have reported issues connecting to a website. The website is hosted by two load balancing servers in the cloud and are connected to an external SQL database. Logs on both servers show an increase in CPU and RAM usage. What may be the most effective way to resolve this issue with a complex set of servers?

  • Use threading in the program
  • Cache data in memory
  • Automate deployment of additional servers (CORRECT)
  • Optimize the database

You got it! Automatically deploying additional servers to handle the loads of requests during peak hours can resolve issues with a complex set of servers.

3. It has become increasingly common to use cloud services and virtualization. Which kind of fix, in particular, does virtual cloud deployment speed up and simplify?

  • Deployment of new servers (CORRECT)
  • Application code fixes
  • Log reviewing
  • Postmortems

Right on! Virtualization makes deployment of VM servers in the cloud a fast and relatively simple process.

4. What should we include in our postmortem? (Check all that apply)

  • Root cause of the issue (CORRECT)
  • How we diagnosed the problem (CORRECT)
  • How we fixed the problem (CORRECT)
  • Who caused the problem

Sweet! In order to learn about the problem and how it happens in general, we should include what caused it this time.

Awesome! By clarifying how we identified the problem, it can be more easily identified in the future.

Excellent! In order to share with reviewers how the issue was resolved, it’s important to include what we did to solve it this time.

5. In general, what is the goal of a postmortem? (Check all that apply)

  • To identify who is at fault
  • To allow prevention in the future (CORRECT)
  • To allow speedy remediation of similar issues in the future (CORRECT)
  • To analyze all system bugs

Way to go! By describing the cause of the problem, we can learn to avoid the same circumstances in the future.

Woohoo! By describing in detail how we fixed the problem, we can help others or ourselves fix the same problem more quickly in the future.

6. A website is producing service errors when loading certain pages. Looking at the logs, one of three web servers isn’t responding correctly to requests. What can you do to restore services, while troubleshooting further?  

  • Deploy a new web server  
  • Roll back application changes  
  • Remove the server from the pool (CORRECT)
  • Create standby servers 

Great job! Removing the server from the pool will provide full service to users from the remaining web servers  

7. Which of the following persons is responsible for communicating with customers that are affected by an access issue with a website?  

  • Communications lead (CORRECT) 
  • Manager  
  • Incident controller  
  • Software engineer 

Nice work! The communications lead provides timely updates on the incident and answers questions from users.  

8. When writing an effective postmortem of an incident, what should you NOT include?  

  • What caused the issue  
  • Who caused the issue (CORRECT)
  • What the impact was  
  • The short-term remediation 

Nailed it! A postmortem of an incident should not include the person(s) who caused the issue.

FIXING ERRORS IN PYTHON SCRIPTS

1. How can you use pip3 to address the ImportError issue in your Python script, particularly when a module like matplotlib is missing? 

  • Reinstall the Python script.
  • Update the Python interpreter.
  • Install the missing module using pip3. (CORRECT)
  • Execute the script with Python 2.

Correct

2. What type of error occurred when attempting to run the Python script located in the /usr/bin directory, as indicated by the provided output?

  • ImportError (CORRECT)
  • SyntaxError
  • IndexError
  • ValueError

Correct

3. How is the matplotlib Python library beneficial to programmers?

  • It offers a wide range of data visualization tools and features. (CORRECT)
  • It simplifies the process of running Python scripts concurrently.
  • It enables the creation of web applications with Python. 
  • It provides a Python code editor for writing and debugging scripts.

Correct

4. How did you resolve the MissingColumnError?

  • You added the missing column name to the data.csv file. (CORRECT)
  • You used the “ls” command to check for errors.
  • You rewrote the Python script from scratch.
  • You used the “chmod 777” command.

Correct

5. You are working to debug a recurring problem in a Python program. Which of the following approaches do you think would be the most effective way to solve it? 

  • Increase the system’s memory.
  • Restart the system.
  • Upgrade the system’s software.
  • Identify the sequence of events leading to the problem. (CORRECT)

Correct

6. In anticipation of encountering future errors or unexpected behavior in your Python scripts, which proactive debugging techniques and best practices would you incorporate into your development process? Select all that apply.

  • Establishing a systematic approach to isolate and fix errors as they arise.
  • Regularly reviewing error messages and stack traces from previous runs. (CORRECT)
  • Reinstalling Python libraries to prevent potential errors.
  • Using version control systems to track code changes and facilitate error identification. (CORRECT)
  • Implementing comprehensive unit testing to catch errors early. (CORRECT)

Correct

7. What indication did you get in the lab when you successfully completed debugging the infrastructure script? 

  • The infrastructure script prompts you to press a key to continue. 
  • The infrastructure script runs without displaying any errors. (CORRECT)
  • The infrastructure script writes a message to its log file. 
  • The infrastructure script displays a message stating the program ran successfully.

Correct

8. What is the third step in the process of debugging, following the identification of a bug’s cause?

  • Writing new code to fix the bug
  • Reporting the bug to a supervisor
  • Reproducing the bug (CORRECT)
  • Deleting the entire codebase

Correct

9. After successfully fixing the code and resolving the errors in the provided content, what is the recommended next step to ensure the continued functionality and reliability of the script?

  • Delete the script to start fresh with a clean slate.
  • Test the script thoroughly to confirm that the errors are resolved. (CORRECT)
  • Immediately share the fixed code with colleagues.
  • Reinstall the Python interpreter for optimal performance.

Correct

10. What is the cause of the MissingColumnError in the lab?

  • The infrastructure script references a column that doesn’t exist. 
  • The column name for the column with the company information is missing in the CSV file. (CORRECT)
  • The company information is missing in the CSV file. 
  • The infrastructure script does not have the necessary permissions for accessing the data.csv file.

Correct

11. What is pip3? 

  • A numeral mathematics extension of Matplotlib
  • A command to search for missing information
  • A Python package installer (CORRECT)
  • A plotting library for the Python programming language

Correct

12. In the given scenario where a Python script located in the /usr/bin directory produces an ImportError due to a missing module (i.e., matplotlib), which of the following actions should you take to address the issue?

  • Modify the script’s code to bypass the missing module.
  • Reinstall the Python interpreter.
  • Install the missing module using pip3. (CORRECT)
  • Delete the script and recreate it from scratch.

Correct

13. What is the purpose of the following command: pip3 install matplotlib in the context of resolving the issues with the Python script and matplotlib?

  • It successfully installs the matplotlib library, enabling visualization of data. (CORRECT) 
  • It removes the matplotlib library from the system.
  • It updates the Python interpreter to the latest version.
  • It installs a Python code editor for script development.

Correct

14. Why is effective debugging an essential skill for Python developers, and how does it contribute to the overall success of a project?

  • Debugging allows developers to showcase their coding skills.
  • Debugging helps identify and rectify errors, leading to improved script functionality and reliability. (CORRECT)
  • Debugging helps developers gain a deeper understanding of Python syntax.
  • Effective debugging ensures that scripts run without any issues.

Correct

15. In the lab, which step(s) must you take to fix an ImportError? 

  • Put the missing package in the correct folder. 
  • Change the permissions for the package. 
  • Install pip3. (CORRECT)
  • Install the missing package. (CORRECT)

Correct

16. In the lab, what caused the NoFileError message? Select all that apply. 

  • Renaming the data.bak file to data.csv. (CORRECT)
  • Changing the permissions on the data.csv file. 
  • Moving the data.csv file to the working folder. 
  • Checking the working folder for the data.csv file. (CORRECT)

Correct

17. How did the chmod 777 command contribute to resolving the issues with the Python script?

  • The command changed file permissions to make the data.csv file writable. (CORRECT)
  • The command uninstalled the Matplotlib library.
  • The command modified the script’s code to fix the issue.
  • The command was used to install a missing Python library.

Correct

18. In the lab, you ran the infrastructure script and received a NoFileError message about the file named data.csv. What caused this error?

  • The infrastructure program encountered a permission problem when opening a file. 
  • The infrastructure program must be in the same folder as the file it called.
  • The infrastructure program has a typo in the name of the file it calls. 
  • The infrastructure program called a file that can’t be found. (CORRECT)

Correct

19. Which sequence of actions effectively addressed the issue, starting from identifying the file extension problem to ultimately resolving the MissingColumnError in the Python script?

  • Adding the missing column name > Renaming data.bak to data.csv > Checking the data.csv file
  • Granting permissions to data.csv > Checking the data.csv file > Adding the missing column name
  • Renaming data.bak to data.csv > Adding the missing column name > Granting permissions to data.csv (CORRECT)
  • Checking the data.csv file > Renaming data.bak to data.csv > Granting permissions to “data.csv”

Correct

20. What is the function of pip3 in Python?

  • Creates graphical user interfaces
  • Runs Python scripts
  • Acts as a plots library for Python 
  • Downloads and configures new python modules (CORRECT)

Correct

21. If you did not write the Python program and don’t have access to the source code, what should you examine to determine where the program is running and any errors that are occurring? 

  • The results of the grantaccess command 
  • The results of the pip3 command
  • You should examine the program’s surrounding environment. (CORRECT)
  • Figure out where the program is executing and identify any errors.

Correct

22. The lab presents a scenario where you’re tasked with troubleshooting a Python script named infrastructure that is generating errors. You didn’t create the script and don’t have access to its source code. What steps in the lab enable you to troubleshoot this program? Select all that apply. 

  • Run the infrastructure script to determine whether it generates any errors. (CORRECT)
  • Obtain the source code and debug it. 
  • Research any displayed error messages to identify their cause. (CORRECT)
  • Take steps to resolve any errors displayed. (CORRECT)

Correct

23. In the context of code debugging and error resolution, what is the significance of conducting comprehensive testing after fixing the code, and how does it contribute to the overall quality of the script and project success?

  • Testing allows developers to showcase their coding skills.
  • Testing ensures that the code adheres to the latest programming standards.
  • Testing primarily focuses on optimizing the code for speed.
  • Comprehensive testing verifies that the code functions as expected after fixes, enhancing script reliability and project success. (CORRECT)

Correct

24. What is one of the primary purposes of matplotlib?

  • It focuses on numerical mathematics and extends the Python language.
  • It is primarily used for visualizing 2D plots of arrays and data. (CORRECT)
  • It provides an object-oriented API for creating graphical user interfaces.
  • It serves as a Python code editor for writing and debugging scripts.

Correct

25. Having completed the lab and worked through the process, which of the following would you want to check when debugging or troubleshooting later iterations of the same software? Select all that apply.

  • Current bug reports (CORRECT)
  • Redesigns of the user interface
  • Future software upgrades
  • More users (CORRECT)

Correct

26. Why is it essential for developers to isolate specific issues or errors in software when troubleshooting? Select all that apply.

  • Isolating issues simplifies the code and eliminates unnecessary complexity. (CORRECT)
  • Isolating issues reduces the need for comprehensive testing.
  • Effective isolation enables targeted problem-solving and prevents broader system disruptions. (CORRECT)
  • Isolating issues allows developers to work on multiple problems simultaneously.

Correct

CONCLUSION – Crashing Programs

In conclusion, this module has provided a comprehensive introduction to the common challenge of program crashes, offering insights into troubleshooting both system and application crashes. By familiarizing yourself with various tools and log files, you’ve gained the ability to effectively identify the root causes of crashes. Additionally, you’ve delved into the investigation of code crashes and explored preventive measures to mitigate such occurrences. Understanding the implications of unhandled errors and exceptions, coupled with learning debugging techniques, has equipped you with valuable skills in error identification and resolution.

Furthermore, the module has expanded your understanding to handling crashes and incidents on a larger scale, exemplified through a scenario involving a significant error occurrence rate on an eCommerce site. Through emphasizing the importance of communication, documentation, and post-mortem analysis in incident management, you’ve learned valuable strategies to prevent future issues and promote continuous improvement in software reliability.