Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lazy parametrization with pytest

When parametrizing tests and fixtures in pytest, pytest seem to eagerly evaluate all parameters and to construct some test list datastructure before starting to execute the tests.

This is a problem in 2 situations:

  1. when you have many parameter values (e.g. from a generator) - the generator and test itself may run fast but all those parameter values eat up all the memory
  2. when parametrizing a fixture with different kind of expensive resources, where you only can afford to run one resource at the same time (e.g. because they listen on the same port or something like that)

Thus my question: Is it possibly to tell pytest to evaluate the parameters on the fly (i.e. lazily)?

like image 983
maxschlepzig Avatar asked Jan 12 '19 20:01

maxschlepzig


People also ask

What is pytest lazy fixture?

pytest-lazy-fixture lets you use a fixture as one of the values passed in @pytest.mark.parametrize : import pytest from pytest_lazyfixture import lazy_fixture @pytest. fixture def one(): return 1 @pytest. mark. parametrize('arg1,arg2', [ ('val1', lazy_fixture('one')), ]) def test_func(arg1, arg2): assert arg2 == 1.

Can pytest fixtures be parameterized?

pytest. fixture() allows one to parametrize fixture functions.

How does pytest Parametrize work?

pytest will build a string that is the test ID for each set of values in a parametrized test. These IDs can be used with -k to select specific cases to run, and they will also identify the specific case when one is failing. Running pytest with --collect-only will show the generated IDs.


2 Answers

As for your 2 question - proposed in comment link to manual seems like exactly what one should do. It allows "to setup expensive resources like DB connections or subprocess only when the actual test is run".


But as for 1 question it seems like such feature not implemented. You may directly pass generator to parametrize like so:

@pytest.mark.parametrize('data', data_gen)
def test_gen(data):
    ...

But pytest will list() of your generator -> RAM problems persists here as well.

I've also found some github issues than shed more light about why pytest not handle generator lazily. And it seems like a design problem. So "its not possible to correctly manage parametrization having a generator as value" because of

"pytest would have to collect all those tests with all the metadata... collection happens always before test running".

There are also some refers to hypothesis or nose's yield-base tests in such cases. But if you still want to stick to pytest there are some workarounds:

  1. If you somehow knew the number of generated params you may do the following:
import pytest

def get_data(N):
    for i in range(N):
        yield list(range(N))

N = 3000
data_gen = get_data(N)
@pytest.mark.parametrize('ind', range(N))
def test_yield(ind):
    data = next(data_gen)
    assert data

So here you parametrize over index (which is not so useful - just indicating pytest number of executions it must made) and generate data inside next run. You may also wrap it to memory_profiler:

Results (46.53s):
    3000 passed
Filename: run_test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     40.6 MiB     40.6 MiB   @profile
     6                             def to_profile():
     7     76.6 MiB     36.1 MiB       pytest.main(['test.py'])

And compare with straightforward:

@pytest.mark.parametrize('data', data_gen)
def test_yield(data):
    assert data

Which 'eats' much more memory:

Results (48.11s):
    3000 passed
Filename: run_test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     40.7 MiB     40.7 MiB   @profile
     6                             def to_profile():
     7    409.3 MiB    368.6 MiB       pytest.main(['test.py'])
  1. If you want to parametrize your test over another params at the same time you may do a bit generalization of previous clause like so:
data_gen = get_data(N)
@pytest.fixture(scope='module', params=len_of_gen_if_known)
def fix():
    huge_data_chunk = next(data_gen)
    return huge_data_chunk


@pytest.mark.parametrize('other_param', ['aaa', 'bbb'])
def test_one(fix, other_param):
    data = fix
    ...

So we use fixture here at module scope level in order to "preset" our data for parametrized test. Note that right here you may add another test and it will receive generated data as well. Simply add it after test_two:

@pytest.mark.parametrize('param2', [15, 'asdb', 1j])
def test_two(fix, param2):
    data = fix
    ...

NOTE: if you do not know the number of generated data you may use this trick: set some approximate value (better if it be a bit higher than generated tests count) and 'mark' tests passed if it stops with StopIteration which will happen when all data generated already.

  1. Another possibility is to use Factories as fixtures. Here you embed your generator into fixture and try yield in your test till it not ends. But here is another disadvantage - pytest will treat it as single test (with possibly bunch of checks inside) and will fail if one of generated data fails. Other words if compare to parametrize approach not all pytest statistic/features may be accessed.

  2. And yet one another is to use pytest.main() in the loop something like so:

# data_generate
# set_up test
pytest.main(['test'])
  1. Is not concerning iterators itself rather the way to save more Time/RAM if one has parametrized test: Simply move some parametrization inside tests. Example:
@pytest.mark.parametrize("one", list_1)
@pytest.mark.parametrize("two", list_2)
def test_maybe_convert_objects(self, one, two):
    ...

Change to:

@pytest.mark.parametrize("one", list_1)
def test_maybe_convert_objects(self, one):
    for two in list_2:
        ...

It's similar to factories but even more easy to implement. Also it not only reduce RAM multiple times but time for collecting metainfo as well. Drawbacks here - for pytest it would be one test for all two values. And it works smoothly with "simple" tests - if one have some special xmarks inside or something there might be problems.

I've also opened corresponding issue there might appear some additional info/tweaks about this problem.

like image 140
BeforeFlight Avatar answered Sep 21 '22 01:09

BeforeFlight


EDIT: my first reaction would be "that is exactly what parametrized fixtures are for": a function-scoped fixture is a lazy value being called just before the test node is executed, and by parametrizing the fixture you can predefine as many variants (for example from a database key listing) as you like.

from pytest_cases import fixture_plus

@fixture_plus
def db():
    return <todo>

@fixture_plus
@pytest.mark.parametrize("key", [<list_of keys>])
def sample(db, key):
    return db.get(key)

def test_foo(sample):
    return sample

That being said, in some (rare) situations you still need lazy values in a parametrize function, and you do not wish these to be the variants of a parametrized fixture. For those situations, there is now a solution also in pytest-cases, with lazy_value. With it, you can use functions in the parameter values, and these functions get called only when the test at hand is executed.

Here is an example showing two coding styles (switch the use_partial boolean arg to True to enable the other alternative)

from functools import partial
from random import random

import pytest
from pytest_cases import lazy_value

database = [random() for i in range(10)]

def get_param(i):
    return database[i]


def make_param_getter(i, use_partial=False):
    if use_partial:
        return partial(get_param, i)
    else:
        def _get_param():
            return database[i]

        return _get_param

many_lazy_parameters = (make_param_getter(i) for i in range(10))

@pytest.mark.parametrize('a', [lazy_value(f) for f in many_lazy_parameters])
def test_foo(a):
    print(a)

Note that lazy_value also has an id argument if you wish to customize the test ids. The default is to use the function __name__, and a support for partial functions is on the way.

You can parametrize fixtures the same way, but remember that you have to use @fixture_plus instead of @pytest.fixture. See pytest-cases documentation for details.

I'm the author of pytest-cases by the way ;)

like image 38
smarie Avatar answered Sep 19 '22 01:09

smarie