Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you reproduce bugs that occur sporadically?

We have a bug in our application that does not occur every time and therefore we don't know its "logic". I don't even get it reproduced in 100 times today.

Disclaimer: This bug exists and I've seen it. It's not a pebkac or something similar.

What are common hints to reproduce this kind of bug?

like image 655
guerda Avatar asked Mar 25 '10 13:03

guerda


People also ask

How do intermittent issues reproduce?

A intermittent Bug is the Bug which is present in the Application/Product but seen ocassionally in alternative releases. There is no specific ways to reproduce it and hence it is little difficult to trace it out.It can be logged into the defect tracking but differed until a special test is done to locate it.

How do you deal with a bug which is intermittent and not reproducible?

First step: By using some sort of remote software, you let the customer tell you what to do to reproduce the problem on the system that has it. If this fails, then close it. Second step: Try to reproduce the problem on another system. If this fails, make an exact copy of the customers system.

How do you reproduce errors?

To be able to reproduce an error you need to know what external actions lead to the error. Sometimes the error is caused by a single action, and sometimes it is caused by a series of interdependent actions. For instance, your application may fail on the third of three related requests.


2 Answers

Analyze the problem in a pair and pair-read the code. Make notes of the problems you KNOW to be true and try to assert which logical preconditions must hold true for this happen. Follow the evidence like a CSI.

Most people instinctively say "add more logging", and this may be a solution. But for a lot of problems this just makes things worse, since logging can change timing-dependencies sufficiently to make the problem more or less frequent. Changing the frequency from 1 in 1000 to 1 in 1,000,000 will not bring you closer to the true source of the problem.

So if your logical reasoning does not solve the problem, it'll probably give you a few specifics you could investigate with logging or assertions in your code.

like image 175
krosenvold Avatar answered Oct 01 '22 23:10

krosenvold


There is no general good answer to the question, but here is what I have found:

  1. It takes a talent for this kind of thing. Not all developers are best suited for it, even if they are superstars in other areas. So know your team, who has a talent for it, and hope you can give them enough candy to get them excited about helping you out, even if it isn't their area.

  2. Work backwards, and treat it like a scientific investigation. Start with the bug, what you see is wrong. Develop hypotheses about what could cause it (this is the creative/imaginative part, the art that not everyone has the talent for) - and it helps a lot to know how the code works. For each of those hypotheses (preferably sorted by what you think is most likely - again pure gut feel here), develop a test that tries to eliminate it as the cause, and test the hypothesis. Any given failure to meet a prediction doesn't mean the hypothesis is wrong. Test the hypothesis until it is confirmed to be wrong (although as it gets less likely you may want to move on to another hypothesis first, just don't discount this one until you have a definitive failure).

  3. Gather as much data as you can during this process. Extensive logging and whatever else is applicable. Do not discount a hypothesis because you lack the data, rather remedy the lack of data. Quite often the inspiration for the right hypothesis comes from examining the data. Noticing something off in a stack trace, weird issue in a log, something missing that should be there in a database, etc.

  4. Double check every assumption. So many times I have seen an issue not get fixed quickly because some general method call was not further investigated, so the problem was just assumed to be not applicable. "Oh that, that should be simple." (See point 1).

If you run out of hypotheses, that is generally caused by insufficient knowledge of the system (this is true even if you wrote every line of code yourself), and you need to run through and review code and gain additional insight into the system to come up with a new idea.

Of course, none of the above guarantees anything, but that is the approach that I have found gets results consistently.

like image 22
Yishai Avatar answered Oct 01 '22 22:10

Yishai