Debugging Under Pressure: My Process for Finding Bugs Faster
Introduction
In my opinion, debugging is one of the most important skills a developer can have. I handle a lot of incident response, and over time I’ve developed this process to stay organized, avoid assumptions, and get to the root cause faster.
Step 1: Don’t Make the Situation Worse
Take a deep breath. The last thing you want to do is rush and make a bad decision that makes everything worse. Your job is to be the calm in the storm, and the source of truth for what's going on.
Avoid saying the phrase “it's probably X” or “I think it's Y”. I get it, you've got 20 people bombarding you with questions and you want to be helpful by giving them hope. At that moment opinions don’t matter, and it's better to just say “I don't know but I'll find out.”
In the rush to fix the problem, you start applying patches based on what you think is happening instead of what the evidence shows. This is where a lot of devs make the situation worse. Maybe you restore a DB backup thinking that will fix the issue but it doesn't. Now it's still broken and you've lost data.
It’s far better to spend a couple extra minutes understanding the situation. You want to go in with a scalpel not a sledge hammer.
Step 2: Make a Debug Document
The next issue a lot of devs run into is wasting time chasing unrelated bugs. What I do is open a word document and add the following 4 sections to it.
Bug Description: Write down what the bug is and be as precise as possible when describing the issue. It doesn't have to be long, it's just there to keep you focused on the issue at hand. It's really easy to get lost down a rabbit hole or start chasing bugs you find that aren't related to the incident at hand.
Knowns: This is everything you know for a fact about the issue. Don't include anything that you can't prove via logs or source code. The goal is to keep track of everything you learn while looking into the issue.
Unknowns: Anytime a question comes up that you don't know the answer to write it down here. Doesn't matter if you're the one asking, or if it's someone else. As soon as you have the answer, cross the question off your list. Don't delete it, just mark it as solved. This is helpful for the incident response report. By the end, your explanation of the issue should account for every known fact and answer every unknown.
Logs & Code Snippets: This is your evidence. It backs up everything listed in your known column. You're going to have a lot of tabs and pages open while debugging, and it's easy to lose track of what you find along the way. This lets you keep all of it in one easy to find place. Be sure to include where the log or code came from as well.
Step 3: Assume Nothing
I can't stress this enough, assume nothing. The goal here is to solve the problem as quickly as possible, while avoiding distractions.
My advice is to start with the known issue and work backwards. Let's say you have a headless front end where the bug is and an API Backend. Instead of jumping into the backend where you think the bug is, start by looking at the front end where the bug actually is. Check which endpoints are being called and what data is being sent. Follow that thread through the tech stack, making sure each step works as expected. When you find something, add it to your document.
A lot of debugging is trying to figure out what could be causing the issue. However, it's often faster to figure out what's not causing the bug. Being able to eliminate large numbers of possibilities with a single test is worth it. This is especially important when the issue is difficult to track down. Knowing that you're looking in the right place will save you a ton of time.
Step 4: If You Can’t Reproduce It, Improve Visibility
Sometimes you can't recreate the bug. I've seen a lot of devs send the ticket back just because they couldn't recreate the problem on their end. Don't do this. It usually comes back to you anyway and only delays the fix.
My previous tip about eliminating possibilities is very helpful in these cases. I like to add additional logging, so that when the bug happens again I have new information that helps me debug it.
The goal is to create a trigger that gives you more insight into the issue. If you keep doing this even the hardest to solve bugs get fixed over time.
Step 5: Communication
I touch in this a little bit in Step 1, but communication is critical in high stress situations. Don’t let your emotions get the better of you and try to stay calm. Whatever you do, don't get mad even when it's someone's fault.
Your goal is to be 100% correct in everything you communicate to the team. People need to know they can trust what you're saying to be true, so they can take action in their end. While you're working on the bug, other people are communicating the issue to management and clients. If you give them wrong info, not only do you look bad, you make the entire company look bad.
Final Thoughts
There is a big difference between debugging your own code and restoring services during an emergency outage. The real test is when production breaks, people are asking questions, and there’s pressure to move fast. If you want to be the go-to person on your team during incidents, you need to earn the trust of your colleagues and show that they can rely on you to make things better, not worse.
We all make mistakes, that’s what makes us human. However, being known as the “buggy developer” has a huge negative impact on your career and growth. Your reputation is built on how often you catch your mistakes before they impact others. One of the best ways to build trust is by debugging your own work before submitting the PR. Your goal isn’t to be perfect. It’s to find your mistakes before someone else does.
Taking an extra 10 to 15 minutes to test and validate your changes now is far cheaper than having the same issue come back six months later when you're under pressure delivering something else.
Best of luck fellow devs, you got this!