We have a large test codebase with more than 1500 tests for a Python/Django application. Most of the tests use factory-boy
for generating data for the project models.
Currently, we are using nose
test runner, but open to switching to py.test
.
The problem is, from time to time, when running parts of tests, combination of tests we encounter unexpected test failures that are not reproduced when running all the tests or these tests individually.
It looks like the tests are actually coupled.
The Question: Is it possible to automatically detect all the coupled tests in the project?
My current thinking is to run all the tests in different random combinations or order and report the failures, can nose
or py.test
help with that?
For a definite answer you'd have to run each test in complete isolation from the rest.
With pytest
, which is what I use, you could implement a script that first runs it with --collect-only
and then use the test node ids returned to initiate an individual pytest
run for each of them.
This will take a good while for your 1500 tests, but it should do the job as long as you completely recreate
the state of your system between each individual test.
For an approximate answer, you can try running your tests in random order and see how many start failing. I had a similar question recently, so I tried two pytest
plugins -- pytest-randomly
and pytest-random
:
https://pypi.python.org/pypi/pytest-randomly/
https://pypi.python.org/pypi/pytest-random/
From the two, pytest-randomly
looks like the more mature one and even supports repeating a certain order by accepting a seed
parameter.
These plugins do a good job in randomising the test order, but for a large test suite complete randomisation may not be very workable because you then have too many failing tests and you don't know where to start.
I wrote my own plugin that allows me to control the level at which the tests can change order randomly (module, package, or global). It is called pytest-random-order
: https://pypi.python.org/pypi/pytest-random-order/
UPDATE. In your question you say that the failure cannot be reproduced when running tests individually. It could be that you aren't completely recreating the environment for individual test run. I think it's ok that some tests leave state dirty. It is the responsibility of each test case to set up the environment as they need it and not necessarily clean up afterwards due to performance overhead this would cause for subsequent tests or just because of the burden of doing it.
If test X fails as part of a larger test suite and then does not fail when running individually, then this test X is not doing a good enough job in setting up the environment for the test.
As you are already using nosetests
framework, perhaps you can use nose-randomly
(https://pypi.python.org/pypi/nose-randomly) to run the test cases in a random order.
Every time you run the nose tests with nose-randomly
, each run is tagged with a random seed which you can use to repeat the same order of running the test.
So you run your test cases with this plugin multiple times and record the random seeds. Whenever you see any failures with a particular order, you can always reproduce them by running them with the random seed.
Ideally it is not possible to identify the test dependencies and the failures unless you run all the combinations of 1500 tests which is 2^1500-1.
So make it a habit to run your tests with random enabled always when you run them. At some point you will hit failures and keep running them till you catch as many failures as possible.
Unless the failures are catching real bugs of your product, it is always a good habit to fix them and make test dependencies as less as possible. This will keep the consistency of a test result and you can always run and verify a test case independently and be sure of the quality of your product around that scenario.
Hope that helps and this is what we do in our work place to achieve exactly the same situation you are trying to achieve.
I've resolved similar issues on a large Django project which was also using nose runner and factory-boy. I can't tell you how to automatically detect test coupling like the question asked for, but I've the hindsight to tell about some of the issues which were causing coupling in my case:
Check all imports of TestCase
and make sure they use Django's TestCase
and not unittest's TestCase
. If some developers on the team are using PyCharm, which has a handy auto-import feature, it can be very easy to accidentally import the name from the wrong place. The unittest TestCase
will happily run in a big Django project's test suite, but you may not get the nice commit and rollback features that the Django test case has.
Make sure that any test class which overrides setUp
, tearDown
, setUpClass
, tearDownClass
also delegates to super
. I know this sounds obvious, but it's very easy to forget!
It is also possible for mutable state to sneak in due to factory boy. Careful with usages of factory sequences, which look something like:
name = factory.Sequence(lambda n: 'alecxe-{0}'.format(n))
Even if the db is clean, the sequence may not start at 0 if other tests have run beforehand. This can bite you if you have made assertions with incorrect assumptions about what the values of Django models will be when created by factory boy.
Similarly, you can't make assumptions about primary keys. Suppose a django model Potato
is keyed off an auto-field, and there are no Potato
rows at the beginning of a test, and factory boy creates a potato i.e. you used PotatoFactory()
in the setUp
. You are not guaranteed that the primary key will be 1, surprisingly. You should hold a reference to the instance returned by the factory, and make assertions against that actual instance.
Be very careful also with RelatedFactory
and SubFactory
. Factory boy has a habit of picking any old instance to satisfy a relation, if one already exists hanging around in the db. This means what you get as a related object is sometimes not repeatable - if other objects are created in setUpClass
or fixtures, the related object chosen (or created) by a factory may be unpredictable because the order of the tests is arbitrary.
Situations where Django models have @receiver
decorators with post_save
or pre_save
hooks are very tricky to handle correctly with factory boy. For better control over related objects, including the cases where just grabbing any old instance may not be correct, you sometimes have to handle details yourself by overriding the _generate
class method on a factory and/or implementing your own hooks using @factory.post_generation
decorator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With