Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to unit test code that generates random output?

Specifically, I've got a method picks n items from a list in such a way that a% of them meet one criterion, and b% meet a second, and so on. A simplified example would be to pick 5 items where 50% have a given property with the value 'true', and 50% 'false'; 50% of the time the method would return 2 true/3 false, and the other 50%, 3 true/2 false.

Statistically speaking, this means that over 100 runs, I should get about 250 true/250 false, but because of the randomness, 240/260 is entirely possible.

What's the best way to unit test this? I'm assuming that even though technically 300/200 is possible, it should probably fail the test if this happens. Is there a generally accepted tolerance for cases like this, and if so, how do you determine what that is?

Edit: In the code I'm working on, I don't have the luxury of using a pseudo-random number generator, or a mechanism of forcing it to balance out over time, as the lists that are picked out are generated on different machines. I need to be able to demonstrate that over time, the average number of items matching each criterion will tend to the required percentage.

like image 610
Flynn1179 Avatar asked Jun 18 '10 09:06

Flynn1179


People also ask

How do you test randomness?

Hypothesis: To test the run test of randomness, first set up the null and alternative hypothesis. In run test of randomness, null hypothesis assumes that the distributions of the two continuous populations are the same. The alternative hypothesis will be the opposite of the null hypothesis.

Should I use random data in unit tests?

No. Random values in unit tests cause them to be not repeatable. As soon as one test will pass and another will fail without any change, people lose confidence in them, undermining their value.


1 Answers

Random and statistics are not favored in unit tests. Unit tests should always return the same result. Always. Not mostly.

What you could do is trying to remove the random generator of the logic you are testing. Then you can mock the random generator and return predefined values.


Additional thoughts:

You could consider to change the implementation to make it more testable. Try to get as less random values as possible. You could for instance only get one random value to determine the deviation from the average distribution. This would be easy to test. If the random value is zero, you should get the exact distribution you expect in average. If the value is for instance 1.0, you miss the average by some defined factor, for instance by 10%. You could also implement some Gaussian distribution etc. I know this is not the topic here, but if you are free to implement it as you want, consider testability.

like image 126
Stefan Steinegger Avatar answered Oct 19 '22 11:10

Stefan Steinegger