Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically detect test coupling

We have a large test codebase with more than 1500 tests for a Python/Django application. Most of the tests use factory-boy for generating data for the project models.

Currently, we are using nose test runner, but open to switching to py.test.

The problem is, from time to time, when running parts of tests, combination of tests we encounter unexpected test failures that are not reproduced when running all the tests or these tests individually.

It looks like the tests are actually coupled.

The Question: Is it possible to automatically detect all the coupled tests in the project?

My current thinking is to run all the tests in different random combinations or order and report the failures, can nose or py.test help with that?

like image 373
alecxe Avatar asked Dec 24 '16 02:12

alecxe


3 Answers

For a definite answer you'd have to run each test in complete isolation from the rest.

With pytest, which is what I use, you could implement a script that first runs it with --collect-only and then use the test node ids returned to initiate an individual pytest run for each of them. This will take a good while for your 1500 tests, but it should do the job as long as you completely recreate the state of your system between each individual test.

For an approximate answer, you can try running your tests in random order and see how many start failing. I had a similar question recently, so I tried two pytest plugins -- pytest-randomly and pytest-random: https://pypi.python.org/pypi/pytest-randomly/ https://pypi.python.org/pypi/pytest-random/

From the two, pytest-randomly looks like the more mature one and even supports repeating a certain order by accepting a seed parameter.

These plugins do a good job in randomising the test order, but for a large test suite complete randomisation may not be very workable because you then have too many failing tests and you don't know where to start.

I wrote my own plugin that allows me to control the level at which the tests can change order randomly (module, package, or global). It is called pytest-random-order: https://pypi.python.org/pypi/pytest-random-order/

UPDATE. In your question you say that the failure cannot be reproduced when running tests individually. It could be that you aren't completely recreating the environment for individual test run. I think it's ok that some tests leave state dirty. It is the responsibility of each test case to set up the environment as they need it and not necessarily clean up afterwards due to performance overhead this would cause for subsequent tests or just because of the burden of doing it.

If test X fails as part of a larger test suite and then does not fail when running individually, then this test X is not doing a good enough job in setting up the environment for the test.

like image 157
jbasko Avatar answered Sep 17 '22 12:09

jbasko


As you are already using nosetests framework, perhaps you can use nose-randomly (https://pypi.python.org/pypi/nose-randomly) to run the test cases in a random order.

Every time you run the nose tests with nose-randomly, each run is tagged with a random seed which you can use to repeat the same order of running the test.

So you run your test cases with this plugin multiple times and record the random seeds. Whenever you see any failures with a particular order, you can always reproduce them by running them with the random seed.

Ideally it is not possible to identify the test dependencies and the failures unless you run all the combinations of 1500 tests which is 2^1500-1.

So make it a habit to run your tests with random enabled always when you run them. At some point you will hit failures and keep running them till you catch as many failures as possible.

Unless the failures are catching real bugs of your product, it is always a good habit to fix them and make test dependencies as less as possible. This will keep the consistency of a test result and you can always run and verify a test case independently and be sure of the quality of your product around that scenario.

Hope that helps and this is what we do in our work place to achieve exactly the same situation you are trying to achieve.

like image 27
Joshi Sravan Kumar Avatar answered Sep 18 '22 12:09

Joshi Sravan Kumar


I've resolved similar issues on a large Django project which was also using nose runner and factory-boy. I can't tell you how to automatically detect test coupling like the question asked for, but I've the hindsight to tell about some of the issues which were causing coupling in my case:

Check all imports of TestCase and make sure they use Django's TestCase and not unittest's TestCase. If some developers on the team are using PyCharm, which has a handy auto-import feature, it can be very easy to accidentally import the name from the wrong place. The unittest TestCase will happily run in a big Django project's test suite, but you may not get the nice commit and rollback features that the Django test case has.

Make sure that any test class which overrides setUp, tearDown, setUpClass, tearDownClass also delegates to super. I know this sounds obvious, but it's very easy to forget!

It is also possible for mutable state to sneak in due to factory boy. Careful with usages of factory sequences, which look something like:

name = factory.Sequence(lambda n: 'alecxe-{0}'.format(n))

Even if the db is clean, the sequence may not start at 0 if other tests have run beforehand. This can bite you if you have made assertions with incorrect assumptions about what the values of Django models will be when created by factory boy.

Similarly, you can't make assumptions about primary keys. Suppose a django model Potato is keyed off an auto-field, and there are no Potato rows at the beginning of a test, and factory boy creates a potato i.e. you used PotatoFactory() in the setUp. You are not guaranteed that the primary key will be 1, surprisingly. You should hold a reference to the instance returned by the factory, and make assertions against that actual instance.

Be very careful also with RelatedFactory and SubFactory. Factory boy has a habit of picking any old instance to satisfy a relation, if one already exists hanging around in the db. This means what you get as a related object is sometimes not repeatable - if other objects are created in setUpClass or fixtures, the related object chosen (or created) by a factory may be unpredictable because the order of the tests is arbitrary.

Situations where Django models have @receiver decorators with post_save or pre_save hooks are very tricky to handle correctly with factory boy. For better control over related objects, including the cases where just grabbing any old instance may not be correct, you sometimes have to handle details yourself by overriding the _generate class method on a factory and/or implementing your own hooks using @factory.post_generation decorator.

like image 42
wim Avatar answered Sep 21 '22 12:09

wim