Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing a function that can return non-deterministic results using Python unittest

I am writing a small job scheduler in Python. The scheduler can be given a series of callables plus dependencies, and should run the callables, making sure that no task is run before any of its predecessors.

I am trying to follow a test-driven approach, and I have run into an issue testing dependency handling. My test code looks like this:

def test_add_dependency(self):
    """Tasks can be added with dependencies"""
    # TODO: Unreliable test, may work sometimes because by default, task
    #       running order is indeterminate.
    self.done = []
    def test(id):
        self.done.append("Test " + id)
    s = Schedule()
    tA = Task("Test A", partial(test, "A"))
    tB = Task("Test B", partial(test, "B"))
    s.add_task(tA)
    s.add_task(tB)
    s.add_dependency(tA, tB)
    s.run()
    self.assertEqual(self.done, ["Test B", "Test A"])

The problem is that this test (sometimes) worked even before I added the dependency handling code. This is because the specification does not state that tasks have to be run in a particular order. So the correct order is a perfectly valid choice even if the dependency information is ignored.

Is there a way of writing tests to avoid this sort of "accidental" success? It seems to me that this is a fairly common sort of situation, particularly when taking the test-driven "don't write code without a failing test" approach.

like image 452
Paul Moore Avatar asked Apr 12 '13 13:04

Paul Moore


1 Answers

You are in the situation of every researcher looking at a collection of imperfect data and trying to say whether the hypothesis about it is true or not.

If the results vary between runs, then rerunning many times will give you a sample which you can apply statistics to to decide whether it is working or not. However, if a batch of runs will give you similar results, but a different batch on a different day gives you a different result, then your non-determinism is dependent on events outside the program itself, and you'll need to find a way to control them, ideally so that they maximise the chances of tripping up a bad algorithm.

This is the cost of non-determinism; you have to resort to statistics and you have to get the statistics right. You need to be able to accept the hypothesis with some confidence level, and also reject the null hypothesis. This requires fewer samples if you can maximise the variance of the results; have a varying CPU load, or IO interrupts, or schedule a task with random sleeps in.

Finding out what such a scheduler is affected by would probably be advisable for the purpose of defining a worthwhile test anyway.

like image 131
Phil H Avatar answered Sep 21 '22 08:09

Phil H