Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Works on my machine" - How to fix non-reproducible bugs?

People also ask

How do you handle a non reproducible bug?

First step: By using some sort of remote software, you let the customer tell you what to do to reproduce the problem on the system that has it. If this fails, then close it. Second step: Try to reproduce the problem on another system. If this fails, make an exact copy of the customers system.

Why are some bugs non reproducible?

Bug Duplication (F1) is one of the key factors behind the non-reproducibility of software bugs. The duplicate bugs are often known to the developers and thus might have been already fixed in recent releases. As a result, they cannot be reproduced with the up-to-date version of the software system.

Why are some bugs non reproducible An empirical investigation using data fusion?

Why are Some Bugs Non-Reproducible? : –An Empirical Investigation using Data Fusion– Abstract: Software developers attempt to reproduce software bugs to understand their erroneous behaviours and to fix them. Unfortunately, they often fail to reproduce (or fix) them, which leads to faulty, unreliable software systems.


One of the attributes of good debuggers, I think is that they always have a lot of weapons in their toolkit. They never seem to get "stuck" for too long and there is always something else for them to try. Some of the things I've been known to do:

  1. ask for memory dumps
  2. install a remote debugger on a client machine
  3. add tracing code to builds
  4. add logging code for debugging purposes
  5. add performance counters
  6. add configuration parameters to various bits of suspicious code so I can turn on and off features
  7. rewrite and refactor suspicious code
  8. try to replicate the issue locally on a different OS or machine
  9. use debugging tools such as application verifier
  10. use 3rd party load generation tools
  11. write simulation tools in-house for load generation when the above failed
  12. use tools like Glowcode to analyse memory leaks and performance issues
  13. reinstall the client machine from scratch
  14. get registry dumps and apply them locally
  15. use registry and file watcher tools

Eventually, I find the bug just gives up out of some kind of awe at my persistence. Or the client realises that it's probably a machine or client side install or configuration issue.


Extensive logging usually helps.


The easiest way is always to see the customer in action (assuming that its readily reproducible by the customer). Oftentimes, problems arise due to issues with the customer's computer environment, conflicts with other programs, etc - these are details which you will not be able to catch on your dev rig. So a site visit might be useful; but if that's not convenient, tools like RealVNC might help as well in letting you see the customer 'do their thing'.

(watching the customer in action also allows you to catch them out in any WTF moments that they might have)

Now, if the problem is intermittent, then things get somewhat more complicated. The best way to get around this problem would be to log useful information in places where you guess problems could occur and perhaps use a tool like Splunk to index the log files during analysis. A diagnostic build (i.e. with extra logging) might be useful in this case.


I'm just in the middle of implementing an automated error reporting system that sends back to me information (currently via email although you could use a webservice) from any exception encountered by the app.

That way I get (nearly) all the information that I would do if I was sitting in front of VS2008 and it really helps me to work out what the problem is.

The customers are also usually (sorta) impressed that I know about their problem as soon as they encounter it!

Also, if you use the Application.ThreadException error handler you can send back info on unexpected exceptions too!