Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the procedure for debugging a production-only error?

Let me say upfront that I'm so ignorant on this topic that I don't even know whether this question has objective answers or not. If it ends up being "not," I'll delete or vote to close the post.

Here's the scenario: I just wrote a little web service. It works on my machine. It works on my team lead's machine. It works, as far as I can tell, on every machine except for the production server. The exception that the production server spits out upon failure originates from a third-party JAR file, and is skimpy on information. I search the web for hours, but don't come up with anything useful.

So what's the procedure for tracking down an issue that occurs only on production machines? Is there a standard methodology, or perhaps a category/family of tools, for this?

The error that inspired this question has already been fixed, but that was due more to good fortune than a solid approach to debugging. I'm asking this question for future reference.

EDIT:
The answer to this so far seems to be summed up by one word: logging. The one issue with logging is that it requires forethought. What if a situation comes up in an existing system with poor logging, or the client is worried about sensitive data and does not want extensive logging systems in the system in the first place?

Some related questions:
Test accounts and products in a production system
Running test on Production Code/Server

like image 975
Pops Avatar asked Jun 10 '10 14:06

Pops


People also ask

How do you debug an error in production?

The answer to this is that it can be. Specific debugging methods, such as a remote debugger in your IDE or dumping error information to the user, are not considered safe. Other methods such as logging, error context capture, and debuggers designed explicitly for production can be utilized, with varying success levels.

Where should a data related issues in production be debugged?

Production debugging, as the name suggests, takes place when one must debug the production environment and see the root cause of this problem. This is a form of debugging that can also be done remotely, as during the production phase, it may not be possible to debug within the local environment of the application.


1 Answers

In addition to logging, which is invaluable, here are are some other techniques myself and my co-workers have used over the years... going back to 16-bit windows on client machines we had no access to. (Did I date myself?) Granted, not everything can/will work.

  • Analyze any and all behavior you see.
  • Reproduce, if at all possible, reproduce it.
  • Desk check, walk through code you suspect.
  • Rubber duck it with team members AND people who have little or no familiarity with the code. The more you have to explain something to someone, the better chance you have of uncovering something.
  • Don't get frustrated. Take a 5-10 minute break. Take a quick walk across the building/street/whatever. Don't think about the problem for that time.
  • Listen to your instincts.
like image 133
DevSolo Avatar answered Oct 24 '22 23:10

DevSolo