When parametrizing tests and fixtures in pytest, pytest seem to eagerly evaluate all parameters and to construct some test list datastructure before starting to execute the tests. This is a problem in 2 situations: <ol> <li>when you have many parameter values (e.g. from a generator) - the generator and test itself may run fast but all those parameter values eat up all the memory</li> <li>when parametrizing a fixture with different kind of expensive resources, where you only can afford to run one resource at the same time (e.g. because they listen on the same port or something like that)</li> </ol> Thus my question: Is it possibly to tell pytest to evaluate the parameters on the fly (i.e. lazily)?

As for your 2 question - proposed in comment link to manual seems like exactly what one should do. It allows "to setup expensive resources like DB connections or subprocess only when the actual test is run". <hr> But as for 1 question it seems like such feature not implemented. You may directly pass generator to <code>parametrize</code> like so: <pre class="prettyprint lang-py prettyprint-override"><code>@pytest.mark.parametrize('data', data_gen) def test_gen(data): ... </code></pre> But pytest will <code>list()</code> of your generator -> RAM problems persists here as well. I've also found some github issues than shed more light about why pytest not handle generator lazily. And it seems like a design problem. So "its not possible to correctly manage parametrization having a generator as value" because of <blockquote> "pytest would have to collect all those tests with all the metadata... collection happens always before test running". </blockquote> There are also some refers to <code>hypothesis</code> or <code>nose's yield-base tests</code> in such cases. But if you still want to stick to <code>pytest</code> there are some workarounds: <ol> <li>If you somehow knew the number of generated params you may do the following:</li> </ol> <pre class="prettyprint lang-py prettyprint-override"><code>import pytest def get_data(N): for i in range(N): yield list(range(N)) N = 3000 data_gen = get_data(N) </code></pre> <pre class="prettyprint lang-py prettyprint-override"><code>@pytest.mark.parametrize('ind', range(N)) def test_yield(ind): data = next(data_gen) assert data </code></pre> So here you parametrize over <code>index</code> (which is not so useful - just indicating pytest number of executions it must made) and generate data inside next run. You may also wrap it to <code>memory_profiler</code>: <pre class="prettyprint"><code>Results (46.53s): 3000 passed Filename: run_test.py Line # Mem usage Increment Line Contents ================================================ 5 40.6 MiB 40.6 MiB @profile 6 def to_profile(): 7 76.6 MiB 36.1 MiB pytest.main(['test.py']) </code></pre> And compare with straightforward: <pre class="prettyprint lang-py prettyprint-override"><code>@pytest.mark.parametrize('data', data_gen) def test_yield(data): assert data </code></pre> Which 'eats' much more memory: <pre class="prettyprint"><code>Results (48.11s): 3000 passed Filename: run_test.py Line # Mem usage Increment Line Contents ================================================ 5 40.7 MiB 40.7 MiB @profile 6 def to_profile(): 7 409.3 MiB 368.6 MiB pytest.main(['test.py']) </code></pre> <ol start="2"> <li>If you want to parametrize your test over another params at the same time you may do a bit generalization of previous clause like so:</li> </ol> <pre class="prettyprint"><code>data_gen = get_data(N) @pytest.fixture(scope='module', params=len_of_gen_if_known) def fix(): huge_data_chunk = next(data_gen) return huge_data_chunk @pytest.mark.parametrize('other_param', ['aaa', 'bbb']) def test_one(fix, other_param): data = fix ... </code></pre> So we use fixture here at <code>module</code> scope level in order to "preset" our data for parametrized test. Note that right here you may add another test and it will receive generated data as well. Simply add it after test_two: <pre class="prettyprint lang-py prettyprint-override"><code>@pytest.mark.parametrize('param2', [15, 'asdb', 1j]) def test_two(fix, param2): data = fix ... </code></pre> NOTE: if you do not know the number of generated data you may use this trick: set some approximate value (better if it be a bit higher than generated tests count) and 'mark' tests passed if it stops with <code>StopIteration</code> which will happen when all data generated already. <ol start="3"> <li>Another possibility is to use Factories as fixtures. Here you embed your generator into fixture and <code>try</code> yield in your test till it not ends. But here is another disadvantage - pytest will treat it as single test (with possibly bunch of checks inside) and will fail if one of generated data fails. Other words if compare to parametrize approach not all pytest statistic/features may be accessed.</li> <li>And yet one another is to use <code>pytest.main()</code> in the loop something like so:</li> </ol> <pre class="prettyprint"><code># data_generate # set_up test pytest.main(['test']) </code></pre> <ol start="5"> <li>Is not concerning iterators itself rather the way to save more Time/RAM if one has parametrized test: Simply move some parametrization inside tests. Example:</li> </ol> <pre class="prettyprint lang-py prettyprint-override"><code>@pytest.mark.parametrize("one", list_1) @pytest.mark.parametrize("two", list_2) def test_maybe_convert_objects(self, one, two): ... </code></pre> Change to: <pre class="prettyprint lang-py prettyprint-override"><code>@pytest.mark.parametrize("one", list_1) def test_maybe_convert_objects(self, one): for two in list_2: ... </code></pre> It's similar to factories but even more easy to implement. Also it not only reduce RAM multiple times but time for collecting metainfo as well. Drawbacks here - for pytest it would be one test for all <code>two</code> values. And it works smoothly with "simple" tests - if one have some special <code>xmark</code>s inside or something there might be problems. I've also opened corresponding issue there might appear some additional info/tweaks about this problem.

Lazy parametrization with pytest

2 Answers

As for your 2 question - proposed in comment link to manual seems like exactly what one should do. It allows "to setup expensive resources like DB connections or subprocess only when the actual test is run".

But as for 1 question it seems like such feature not implemented. You may directly pass generator to parametrize like so:

@pytest.mark.parametrize('data', data_gen)
def test_gen(data):
    ...

But pytest will list() of your generator -> RAM problems persists here as well.

I've also found some github issues than shed more light about why pytest not handle generator lazily. And it seems like a design problem. So "its not possible to correctly manage parametrization having a generator as value" because of

"pytest would have to collect all those tests with all the metadata... collection happens always before test running".

There are also some refers to hypothesis or nose's yield-base tests in such cases. But if you still want to stick to pytest there are some workarounds:

If you somehow knew the number of generated params you may do the following:

import pytest

def get_data(N):
    for i in range(N):
        yield list(range(N))

N = 3000
data_gen = get_data(N)

@pytest.mark.parametrize('ind', range(N))
def test_yield(ind):
    data = next(data_gen)
    assert data

So here you parametrize over index (which is not so useful - just indicating pytest number of executions it must made) and generate data inside next run. You may also wrap it to memory_profiler:

Results (46.53s):
    3000 passed
Filename: run_test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     40.6 MiB     40.6 MiB   @profile
     6                             def to_profile():
     7     76.6 MiB     36.1 MiB       pytest.main(['test.py'])

And compare with straightforward:

@pytest.mark.parametrize('data', data_gen)
def test_yield(data):
    assert data

Which 'eats' much more memory:

Results (48.11s):
    3000 passed
Filename: run_test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     40.7 MiB     40.7 MiB   @profile
     6                             def to_profile():
     7    409.3 MiB    368.6 MiB       pytest.main(['test.py'])

If you want to parametrize your test over another params at the same time you may do a bit generalization of previous clause like so:

data_gen = get_data(N)
@pytest.fixture(scope='module', params=len_of_gen_if_known)
def fix():
    huge_data_chunk = next(data_gen)
    return huge_data_chunk


@pytest.mark.parametrize('other_param', ['aaa', 'bbb'])
def test_one(fix, other_param):
    data = fix
    ...

So we use fixture here at module scope level in order to "preset" our data for parametrized test. Note that right here you may add another test and it will receive generated data as well. Simply add it after test_two:

@pytest.mark.parametrize('param2', [15, 'asdb', 1j])
def test_two(fix, param2):
    data = fix
    ...

NOTE: if you do not know the number of generated data you may use this trick: set some approximate value (better if it be a bit higher than generated tests count) and 'mark' tests passed if it stops with StopIteration which will happen when all data generated already.

Another possibility is to use Factories as fixtures. Here you embed your generator into fixture and try yield in your test till it not ends. But here is another disadvantage - pytest will treat it as single test (with possibly bunch of checks inside) and will fail if one of generated data fails. Other words if compare to parametrize approach not all pytest statistic/features may be accessed.
And yet one another is to use pytest.main() in the loop something like so:

# data_generate
# set_up test
pytest.main(['test'])

Is not concerning iterators itself rather the way to save more Time/RAM if one has parametrized test: Simply move some parametrization inside tests. Example:

@pytest.mark.parametrize("one", list_1)
@pytest.mark.parametrize("two", list_2)
def test_maybe_convert_objects(self, one, two):
    ...

Change to:

@pytest.mark.parametrize("one", list_1)
def test_maybe_convert_objects(self, one):
    for two in list_2:
        ...

It's similar to factories but even more easy to implement. Also it not only reduce RAM multiple times but time for collecting metainfo as well. Drawbacks here - for pytest it would be one test for all two values. And it works smoothly with "simple" tests - if one have some special xmarks inside or something there might be problems.

I've also opened corresponding issue there might appear some additional info/tweaks about this problem.

140

answered Sep 21 '22 01:09

BeforeFlight

EDIT: my first reaction would be "that is exactly what parametrized fixtures are for": a function-scoped fixture is a lazy value being called just before the test node is executed, and by parametrizing the fixture you can predefine as many variants (for example from a database key listing) as you like.

from pytest_cases import fixture_plus

@fixture_plus
def db():
    return <todo>

@fixture_plus
@pytest.mark.parametrize("key", [<list_of keys>])
def sample(db, key):
    return db.get(key)

def test_foo(sample):
    return sample

That being said, in some (rare) situations you still need lazy values in a parametrize function, and you do not wish these to be the variants of a parametrized fixture. For those situations, there is now a solution also in pytest-cases, with lazy_value. With it, you can use functions in the parameter values, and these functions get called only when the test at hand is executed.

Here is an example showing two coding styles (switch the use_partial boolean arg to True to enable the other alternative)

from functools import partial
from random import random

import pytest
from pytest_cases import lazy_value

database = [random() for i in range(10)]

def get_param(i):
    return database[i]


def make_param_getter(i, use_partial=False):
    if use_partial:
        return partial(get_param, i)
    else:
        def _get_param():
            return database[i]

        return _get_param

many_lazy_parameters = (make_param_getter(i) for i in range(10))

@pytest.mark.parametrize('a', [lazy_value(f) for f in many_lazy_parameters])
def test_foo(a):
    print(a)

Note that lazy_value also has an id argument if you wish to customize the test ids. The default is to use the function __name__, and a support for partial functions is on the way.

You can parametrize fixtures the same way, but remember that you have to use @fixture_plus instead of @pytest.fixture. See pytest-cases documentation for details.

I'm the author of pytest-cases by the way ;)

answered Sep 19 '22 01:09

smarie

Related questions
                            
                                Find out if two symmetric matrices are the same up to a permutation of the rows/columns
                            
                                pandas.errors.ParserError: Error could possibly be due to quotes being ignored when a multi-char delimiter is used
                            
                                Strange behavior when using toDF() function to transfrom RDD to Dataframe in PySpark
                            
                                How to update a URL every 3 hours in Python
                            
                                Itemgetter Except Columns
                            
                                Firestore DeadlineExceeded exception for big collections
                            
                                plot with polycollection disappears when polygons get too small
                            
                                return the top_k masked softmax of each row for a 2D tensor
                            
                                iterating re.split() on a dataframe
                            
                                Matplotlib Animation: how to dynamically extend x limits?
                            
                                How to set index on categorical type?
                            
                                Using `gpiozero` on `raspberry pi` to control pins, but output pins are reset upon script exit, even though state is remembered between runs
                            
                                matplotlib bar chart with highlights values only
                            
                                How to find out if (the source code of) a function contains a call to a method from a specific module?
                            
                                Creating a numpy array decorated by njit from numba
                            
                                How to include multiple interactive widgets in the same cell in Jupyter notebook
                            
                                Python: Converting excel file to JSON format
                            
                                UndefinedMetricWarning: No positive samples in y_true, true positive value should be meaningless UndefinedMetricWarning)
                            
                                Read certain column in excel to dataframe
                            
                                How does conda-env list / conda info --envs find environments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Lazy parametrization with pytest

Tags:

performance

python

python-3.x

pytest

maxschlepzig

People also ask

2 Answers

BeforeFlight

smarie

Recent Activity

Donate For Us