During one of the recent code reviews, I've stumbled upon the problem that was not immediately easy to spot - there was assertTrue()
used instead of assertEqual()
that basically resulted into a test that was testing nothing. Here is a simplified example:
from unittest import TestCase
class MyTestCase(TestCase):
def test_two_things_equal(self):
self.assertTrue("a", "b")
The problem here is that the test would pass; and technically, the code is valid, since assertTrue
has this optional msg
argument (that gets the "b"
value in this case).
Can we do better than rely on the person reviewing the code to spot this kind of problems? Is there a way to auto-detect it using static code analysis with flake8
or pylint
?
Python now has a type hinting system that does static code analysis. Using this system you can require that the first argument of a function like assertTrue
is always boolean. The problem is that assertTrue
is not defined by you but by the unittest package. Unfortunately, the unittest package didn't add type hints. There's a reasonably simple way around that though: Just define your own wrapper.
from unittest import TestCase
class TestCaseWrapper(TestCase):
def assertTrue(self, expr: bool, msg=None): #The ": bool" requires that the expr parameter is boolean.
TestCase.assertTrue(self, expr, msg)
class MyTestCase(TestCaseWrapper):
def test_two_things_equal(self):
self.assertTrue("a", "b") #Would give a warning about the type of "a".
You can then run the type checker like so:
python -m mypy my_test_case.py
This should then give you a warning about how "a" is a string, not a boolean. The nice thing about this is that it can be run automatically in an automated test framework. Also, PyCharm will check the types in your code if you provide them and highlight anything that's wrong.
Several years ago I came up with a general approach/methodology to assuring the quality of tests. The specification of a test can be reduced to two clauses:
To the best of my knowledge, while the requirement 1. is routinely being exercised, little attention is being paid to requirement 2.
Typically
The actual situation may be that (some of) the tests contain bugs that (would) prevent them from catching bugs in the code. Therefore, seeing tests pass shouldn't suggest much tranquility to the person caring about the quality of the system, until they are confident that the tests are indeed able to detect the problems they were designed against1. And a simple way to do it is to actually introduce such problems and check that they don't remain unnoticed by the tests!
In TDD (test-driven development), this idea is followed only partially - the recommendation is to add the test before the code, see it fail (it should, since there is no code yet) and then fix it by writing the code. But failure of a test because of missing code doesn't automatically mean that it will also fail in case of buggy code (this seems to be true for your case)!
So the quality of a test suite can be measured as a percentage of bugs that it would be capable of detecting. Any reasonable2 bug that escapes a test-suite suggests a new test case covering that scenario (or, if the test suite should have caught that bug, a bug in the test suite is uncovered). This also means that every test of the suite must be able to catch at least one bug (otherwise, that test is completely pointless).
I was thinking about implementing a software system that facilitates adopting this methodology (i.e. allows injecting and maintaining artificial bugs in the code base and checks how the tests respond to them). This question acted as a trigger that I am going to start working on it right away. Hoping to put something together within a week. Stay tuned!
EDIT
A prototype version of the tool is now available at https://bitbucket.org/leon_manukyan/trit. I recommend cloning the repository and running the demo flow.
1 A more generalized version of this statement is true for a wider range of systems/situations (all typically having to do with security/safety):
A system designed against certain events must be routinely tested against such events, otherwise it is prone to degradation down to complete inability to react against the events of interest.
Just an example - do you have a fire alarm system at home? When did you witness it work last time? What if it stays silent during fire too? Go make some smoke in the room right now!
2 Within the scope of this methodology, a back-door like bug (e.g. when the feature misbehaves only if the passed in URL is equal to https://www.formatmyharddrive.com/?confirm=yesofcourse
) is not a reasonable one
One solution for this kind of problem is to use "mutation testing". This idea is to automatically generate "mutants" of your code, by introducing small changes in it. Then your test suite is run against these mutants and if it is good most of them should be killed, meaning that your test suite detects the mutation and the tests fail.
Mutation testing actually evaluates the quality of your tests. In your example, no mutants would be killed and you would easily detect that there's something wrong with the test.
In python, there are several mutation frameworks available:
A quick solution would be to provide a Mixin that checks the correctness:
import unittest
class Mixin(object):
def assertTrue(self, *args, **kwargs):
if len(args) > 1:
# TypeError is just an example, it could also do some warning/logging
# stuff in here.
raise TypeError('msg should be given as keyword parameter.')
super().assertTrue(*args, **kwargs)
class TestMixin(Mixin, unittest.TestCase): # Mixin before other parent classes
def test_two_things_equal(self):
self.assertTrue("a", "b")
The Mixin could also check if the passed expression is a boolean:
class Mixin(object):
def assertTrue(self, *args, **kwargs):
if type(args[0]) is bool:
raise TypeError('expression should be a boolean')
if len(args) > 1:
raise TypeError('msg should be given as keyword parameter.')
super().assertTrue(*args, **kwargs)
However this isn't static and it requires manually altering your test classes (adding the Mixin) and running the tests. Also it will throw a lot of false-positives because passing the message as keyword-argument isn't really common (at least not where I've seen it) and in a lot of cases you want to check the implicit truthiness of the expression instead of the explicit bool
. Like to check for not-emptiness if a
when a
is a list
, dict
, etc.
You could also use some setUp
, teardown
code that alters the assertTrue
method for the particular class:
import unittest
def decorator(func):
def wrapper(*args, **kwargs):
if len(args) > 1:
raise TypeError()
return func(*args, **kwargs)
return wrapper
class TestMixin(unittest.TestCase):
def setUp(self):
self._old = self.assertTrue
self.assertTrue = decorator(self.assertTrue)
def tearDown(self):
self.assertTrue = self._old
def test_two_things_equal(self):
self.assertTrue("a", "b")
But a word of caution before you apply any of these approaches: Always be careful before you alter existing tests. Unfortunatly tests are sometimes poorly documentated, so it's not always obvious what they test for and how they test for it. Sometime a test makes no sense and it's safe to alter it, but sometimes it tests a particular feature in a weird way and when you change it you change what is being tested. So at the very least make sure there's no coverage change when you change the test case. If necessary make sure you clarify the purpose of the test by updating the method name, method documentation or in-line comments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With