There is a flaky test, and we have no clue what the root cause could be. <pre class="prettyprint"><code>with pytest.raises(foo.geocoder.GeocodingFailed): foo.geocoder.geocode('not a valid place') </code></pre> Sometimes the exception does not happen. I looked at the docs how to handle failures, but this did not help. How to trace <code>geocode()</code> so that if the exception does not happen, I see a trace? I looked at the <code>trace</code> module of the standard library. But there seems to be no easy way to get the trace as string. With "a trace" I mean: A trace of all lines which got executed during <code>gecode()</code>. I would like to see method calls and returns statements with indentation. I want to ignore lines from the Python standard library. AFAIK a debugger like pdb does not help here, since the test only fails when it gets run in CI, and only once or twice per month.

The <code>trace</code> module gives programmatic access via <code>trace.Trace</code> class. On test failures the <code>Trace</code> class' console output is visible. And it has coverage report to write on chosen path. I exclude <code>sys.base_exec_prefix</code>, <code>sys.base_prefix</code> to not trace Python library modules. <pre class="prettyprint"><code>import pytest def f(x): import random choice = random.choice((True, False)) if choice: raise ValueError else: return x * 2 def trace(f, *args, **kwargs): import trace import sys tracer = trace.Trace( ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True, ) ret = tracer.runfunc(f, *args, **kwargs) r = tracer.results() r.write_results(coverdir="/tmp/xx-trace") return ret def test_f(): with pytest.raises(ValueError): trace(f, 3) </code></pre> <img src="https://i.stack.imgur.com/ypXw5.png" alt="pytest run"> Coverage report; Lines marked with >>>>>> were not executed i.e. not traced and the numbers with colon are execution counts. <pre class="prettyprint"><code>>>>>>> import pytest >>>>>> def f(x): 1: import random 1: choice = random.choice((True, False)) 1: if choice: >>>>>> raise ValueError else: 1: return x * 2 >>>>>> def trace(f, *args, **kwargs): >>>>>> import trace >>>>>> import sys >>>>>> tracer = trace.Trace( >>>>>> ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True, ) >>>>>> tracer.runfunc(f, *args, **kwargs) >>>>>> r = tracer.results() >>>>>> r.write_results(coverdir="/tmp/xx-trace") >>>>>> def test_f(): >>>>>> with pytest.raises(ValueError): >>>>>> trace(f, 3) </code></pre>

The <code>trace</code> library does not help, as it does not enable writing to a <code>string</code> or a <code>StringIO</code> object, it only writes to real files. What you can do is to use <code>sys.settrace()</code> and define a simple function, which is called for every execution. You will find the documentation here: https://docs.python.org/3/library/sys.html#sys.settrace. The basic magic is to get the details out of the <code>frame</code>object, which is documented here: https://docs.python.org/3/library/inspect.html. A sample to give you an idea looks like this: <pre class="prettyprint"><code>import sys def tracer(frame, event, arg): print(frame.f_code.co_name, frame.f_code.co_filename, frame.f_lineno, event, arg) def bad_function(param: int): if param == 20: raise RuntimeError(f'that failed for {param}') sys.settrace(tracer) bad_function(1) bad_function(20) bad_function(2) </code></pre> It should be easy to store that information into a string for further investigation, or to handle the exception, if it is raised.

Hunt for root cause of flaky test in Python

There is a flaky test, and we have no clue what the root cause could be.

with pytest.raises(foo.geocoder.GeocodingFailed):
    foo.geocoder.geocode('not a valid place')

Sometimes the exception does not happen.

I looked at the docs how to handle failures, but this did not help.

How to trace geocode() so that if the exception does not happen, I see a trace?

I looked at the trace module of the standard library. But there seems to be no easy way to get the trace as string.

With "a trace" I mean: A trace of all lines which got executed during gecode(). I would like to see method calls and returns statements with indentation. I want to ignore lines from the Python standard library.

AFAIK a debugger like pdb does not help here, since the test only fails when it gets run in CI, and only once or twice per month.

What causes flaky tests?

Flaky tests can be caused by various factors: an issue with the newly-written code. an issue with the test itself. some external factor compromising the test results.

Why are e2e tests flaky?

As mentioned before, UI tests are sometimes flaky, and this basically happens because they intend to test the application in an end-to-end way, interacting with the system as the users would do, and some times, network issues, server slowness, and many other aspects can affect the test results, causing false negatives.

What is flaky test in pytest?

Flaky tests sometimes appear when a test suite is run in parallel (such as use of pytest-xdist). This can indicate a test is reliant on test ordering. Perhaps a different test is failing to clean up after itself and leaving behind data which causes the flaky test to fail.

What is @flaky?

Flaky is a plugin for nose or pytest that automatically reruns flaky tests. Ideally, tests reliably pass or fail, but sometimes test fixtures must rely on components that aren’t 100% reliable. With flaky, instead of removing those tests or marking them to @skip, they can be automatically retried.

How do I Mark a test as flaky?

With flaky, instead of removing those tests or marking them to @skip, they can be automatically retried. For more information about flaky, see this presentation. To mark a test as flaky, simply import flaky and decorate the test with @flaky:

How many times can flaky retry a failing test?

By default, flaky will retry a failing test once, but that behavior can be overridden by passing values to the flaky decorator. It accepts two parameters: max_runs, and min_passes; flaky will run tests up to max_runs times, until it has succeeded min_passes times.

The trace module gives programmatic access via trace.Trace class.
On test failures the Trace class' console output is visible.
And it has coverage report to write on chosen path.

I exclude sys.base_exec_prefix, sys.base_prefix to not trace Python library modules.

import pytest


def f(x):
    import random

    choice = random.choice((True, False))

    if choice:
        raise ValueError
    else:
        return x * 2


def trace(f, *args, **kwargs):
    import trace
    import sys

    tracer = trace.Trace(
        ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True,
    )
    ret = tracer.runfunc(f, *args, **kwargs)
    r = tracer.results()
    r.write_results(coverdir="/tmp/xx-trace")
    return ret


def test_f():
    with pytest.raises(ValueError):
        trace(f, 3)

pytest run

Coverage report;
Lines marked with >>>>>> were not executed i.e. not traced and the numbers with colon are execution counts.

>>>>>> import pytest
       
       
>>>>>> def f(x):
    1:     import random
       
    1:     choice = random.choice((True, False))
       
    1:     if choice:
>>>>>>         raise ValueError
           else:
    1:         return x * 2
       
       
>>>>>> def trace(f, *args, **kwargs):
>>>>>>     import trace
>>>>>>     import sys
       
>>>>>>     tracer = trace.Trace(
>>>>>>         ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True,
           )
>>>>>>     tracer.runfunc(f, *args, **kwargs)
>>>>>>     r = tracer.results()
>>>>>>     r.write_results(coverdir="/tmp/xx-trace")
       
       
>>>>>> def test_f():
>>>>>>     with pytest.raises(ValueError):
>>>>>>         trace(f, 3)

The trace library does not help, as it does not enable writing to a string or a StringIO object, it only writes to real files.

What you can do is to use sys.settrace() and define a simple function, which is called for every execution. You will find the documentation here: https://docs.python.org/3/library/sys.html#sys.settrace. The basic magic is to get the details out of the frameobject, which is documented here: https://docs.python.org/3/library/inspect.html.

A sample to give you an idea looks like this:

import sys


def tracer(frame, event, arg):
    print(frame.f_code.co_name, frame.f_code.co_filename, frame.f_lineno, event, arg)


def bad_function(param: int):
    if param == 20:
        raise RuntimeError(f'that failed for {param}')


sys.settrace(tracer)

bad_function(1)
bad_function(20)
bad_function(2)

It should be easy to store that information into a string for further investigation, or to handle the exception, if it is raised.

Hunt for root cause of flaky test in Python

Tags:

python

unit-testing

trace

pytest

guettli

People also ask

2 Answers

Nizam Mohamed

Stefan Sonnenberg-Carstens

Recent Activity

Donate For Us

Hunt for root cause of flaky test in Python

Tags:

python

unit-testing

trace

pytest

guettli

People also ask

2 Answers

Nizam Mohamed

Stefan Sonnenberg-Carstens

Related questions

Recent Activity

Donate For Us