Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing with random inputs best practices

NOTE: I mention the next couple of paragraphs as background. If you just want a TL;DR, feel free to skip down to the numbered questions as they are only indirectly related to this info.

I'm currently writing a python script that does some stuff with POSIX dates (among other things). Unit testing these seems a little bit difficult though, since there's such a wide range of dates and times that can be encountered.

Of course, it's impractical for me to try to test every single date/time combination possible, so I think I'm going to try a unit test that randomizes the inputs and then reports what the inputs were if the test failed. Statisically speaking, I figure that I can achieve a bit more completeness of testing than I could if I tried to think of all potential problem areas (due to missing things) or testing all cases (due to sheer infeasability), assuming that I run it enough times.

So here are a few questions (mainly indirectly related to the above ):

  1. What types of code are good candidates for randomized testing? What types of code aren't?
    • How do I go about determining the number of times to run the code with randomized inputs? I ask this because I want to have a large enough sample to determine any bugs, but don't want to wait a week to get my results.
    • Are these kinds of tests well suited for unit tests, or is there another kind of test that it works well with?
    • Are there any other best practices for doing this kind of thing?

Related topics:

  • Random data in unit tests?
like image 761
Jason Baker Avatar asked Nov 01 '08 20:11

Jason Baker


1 Answers

I agree with Federico - randomised testing is counterproductive. If a test won't reliably pass or fail, it's very hard to fix it and know it's fixed. (This is also a problem when you introduce an unreliable dependency, of course.)

Instead, however, you might like to make sure you've got good data coverage in other ways. For instance:

  • Make sure you have tests for the start, middle and end of every month of every year between 1900 and 2100 (if those are suitable for your code, of course).
  • Use a variety of cultures, or "all of them" if that's known.
  • Try "day 0" and "one day after the end of each month" etc.

In short, still try a lot of values, but do so programmatically and repeatably. You don't need every value you try to be a literal in a test - it's fine to loop round all known values for one axis of your testing, etc.

You'll never get complete coverage, but it will at least be repeatable.

EDIT: I'm sure there are places where random tests are useful, although probably not for unit tests. However, in this case I'd like to suggest something: use one RNG to create a random but known seed, and then seed a new RNG with that value - and log it. That way if something interesting happens you will be able to reproduce it by starting an RNG with the logged seed.

like image 156
Jon Skeet Avatar answered Sep 30 '22 10:09

Jon Skeet