Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get apply's function input dataframe with mocking

I have the following functions

def main():
    (
        pd.DataFrame({'a': [1, 2, float('NaN')], 'b': [1.0, 2, 3]})
        .dropna(subset=['a'])
        .assign(
            b=lambda x: x['b'] * 2
        )
        .apply(do_something_with_each_row, axis='columns')
    )

def do_something_with_each_row(one_row):
    # do_something_with_row
    print(one_row)

In my test, I want to look at the dataframe built after all chained operations and check if everything is fine with it before calling do_something_with_each_row. This last function does not return a dataframe (it just iterates over all rows similarly to iterrow).

I tried to mock the apply function like this:

# need pytest-mock and pytest
import pandas as pd


def test_not_working(mocker):
    mocked_apply = mocker.patch.object(pd.Dataframe, 'apply')
    main()

but in this case, I don't get the access to the dataframe which is input to apply to test its content.

I also tried to mock the do_something_with_each_row:

# need pytest-mock and pytest
import pandas as pd


def test_not_working_again(mocker):
    mocked_to_something = mocker.patch('path.to.file.do_something_with_each_row')
    main()

but this time I have all the calls with row arguments but they all have None values.

How could I get the dataframe for which apply function is called and check that it is indeed same as the following:

pd.Dataframe({'a': [1, 2], 'b': [2.0, 4]})

I am working with the 0.24.2 pandas version, an upgrade to pandas 1.0.5 does not change the matter.

I tried search in pandas issues but didn't find anything about this subject.

like image 302
ndclt Avatar asked Jun 11 '20 11:06

ndclt


People also ask

How do I apply a function to a Dataframe?

Function can be applied either column-wise ( axis = 0) or row-wise ( axis = 1) Round the height and weight to the nearest integer. Function is applied column-wise as defined by axis = 0. When used column-wise, pd.DataFrame.apply () can be applied to multiple columns at once.

How to apply a function along a particular axis in pandas Dataframe?

The pandas dataframe apply () function is used to apply a function along a particular axis of a dataframe. The following is the syntax: We pass the function to be applied and the axis along which to apply it as arguments. To apply the function to each column, pass 0 or 'index' to the axis parameter which is 0 by default.

When to use pandas apply() method?

When every value of the panda’s data structure needs to be manipulated or operated in some specific manner, then the pandas.apply () function can be used. The apply () method is used to apply some specific function to every value in the panda’s data structure.

How do I apply a lambda function to a Lambda Dataframe?

For formulating the resultant series into a dataframe, every column in the lambda dataframe is passed into the apply function by using the iloc as a column reference. So the output returned will also be a column of values.


1 Answers

If I understood your question correctly this is one of the ways to get the behavior you want:

def test_i_think_this_is_what_you_asked(mocker):
    original_apply = pd.DataFrame.apply
    def mocked_apply(self, *args, **kw):
        assert len(self) == 2 # self is the pd.DataFrame at the time apply is called
        assert self.a[0] == 1
        assert self.a[1] == 3 # this will fail cause the value is 2
        assert self.b[0] == 2.0
        assert self.b[1] == 4.0
        return original_apply(self, *args, **kw)
    mocker.patch.object(pd.DataFrame, 'apply', side_effect=mocked_apply, autospec=True)
    main()
like image 170
Alexander Pivovarov Avatar answered Oct 18 '22 01:10

Alexander Pivovarov