I am running a test suite with hypothesis-4.24.6 and pytest-5.0.0. My test has a finite set of possible inputs, but hypothesis never finishes testing.
I have reduced it to the following minimal example, which I run as pytest test.py
from hypothesis import given
import hypothesis.strategies as st
@given(x=st.just(0)
| st.just(1),
y=st.just(0)
| st.just(1)
| st.just(2))
def test_x_y(x, y):
assert True
I would expect it to try all six combinations here and then succeed. Or possibly a small multiple of that to check for flakiness. Instead it runs indefinitely, (after about 15 mins of testing I kill it.)
If I interrupt the test, back traces seem to show it just continuously generating new examples.
What have I done wrong here?
For a typical experiment, you should plan to repeat the experiment at least three times. The more you test the experiment, the more valid your results.
The power of a test can be increased in a number of ways, for example increasing the sample size, decreasing the standard error, increasing the difference between the sample statistic and the hypothesized parameter, or increasing the alpha level.
To graph a significance level of 0.05, we need to shade the 5% of the distribution that is furthest away from the null hypothesis. In the graph above, the two shaded areas are equidistant from the null hypothesis value and each area has a probability of 0.025, for a total of 0.05.
When your p-value is less than or equal to your significance level, you reject the null hypothesis.
This seems to be connected to the amount of successful tests hypothesis
tries to generate:
>>> from hypothesis import given, strategies as st
>>> @given(st.integers(0,1), st.integers(0,2))
... def test(x, y):
... print(x, y)
... assert True
...
>>> test()
0 0
1 1
1 0
1 2
1 1
0 1
0 0
1 2
0 2
0 2
1 0
1 2
0 1
0 1
1 2
[snip…]
See, this part of the docs, for instance, the default amount of successful test cases should be 100. So trying to generate more and more data to only restrict to 6 cases is rapidly failing to find one of these 6 cases.
The simplest approach can be to just limit the amount of examples needed for this test to pass:
>>> from hypothesis import settings
>>> @settings(max_examples=30)
... @given(st.integers(0,1), st.integers(0,2))
... def test(x, y):
... print(x, y)
... assert True
...
>>> test()
0 0
1 1
1 0
0 2
1 2
0 1
0 1
1 1
1 0
1 1
0 1
1 2
1 1
0 0
0 2
0 2
0 0
1 2
1 0
0 1
1 0
1 0
0 1
1 2
1 1
0 2
0 0
1 2
0 0
0 2
An other approach, given the few amount of test cases, would be to explicit them all using @example
and ask hypothesis
to only run those explicit examples:
>>> from hypothesis import given, example, settings, Phase, strategies as st
>>> @settings(phases=(Phase.explicit,))
... @given(x=st.integers(), y=st.integers())
... @example(x=0, y=0)
... @example(x=0, y=1)
... @example(x=0, y=2)
... @example(x=1, y=0)
... @example(x=1, y=1)
... @example(x=1, y=2)
... def test(x, y):
... print(x, y)
... assert True
...
>>> test()
0 0
0 1
0 2
1 0
1 1
1 2
Also note that st.just(0) | st.just(1)
is equivalent to st.one_of(st.just(0), st.just(1))
so choose an approach and stick to it, but don't mix them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With