I have the following functions
def main():
(
pd.DataFrame({'a': [1, 2, float('NaN')], 'b': [1.0, 2, 3]})
.dropna(subset=['a'])
.assign(
b=lambda x: x['b'] * 2
)
.apply(do_something_with_each_row, axis='columns')
)
def do_something_with_each_row(one_row):
# do_something_with_row
print(one_row)
In my test, I want to look at the dataframe built after all chained operations and check if everything is fine with it before calling do_something_with_each_row
. This last function does not return a dataframe (it just iterates over all rows similarly to iterrow
).
I tried to mock the apply
function like this:
# need pytest-mock and pytest
import pandas as pd
def test_not_working(mocker):
mocked_apply = mocker.patch.object(pd.Dataframe, 'apply')
main()
but in this case, I don't get the access to the dataframe which is input to apply
to test its content.
I also tried to mock the do_something_with_each_row
:
# need pytest-mock and pytest
import pandas as pd
def test_not_working_again(mocker):
mocked_to_something = mocker.patch('path.to.file.do_something_with_each_row')
main()
but this time I have all the calls with row arguments but they all have None
values.
How could I get the dataframe for which apply
function is called and check that it is indeed same as the following:
pd.Dataframe({'a': [1, 2], 'b': [2.0, 4]})
I am working with the 0.24.2
pandas version, an upgrade to pandas 1.0.5
does not change the matter.
I tried search in pandas issues but didn't find anything about this subject.
Function can be applied either column-wise ( axis = 0) or row-wise ( axis = 1) Round the height and weight to the nearest integer. Function is applied column-wise as defined by axis = 0. When used column-wise, pd.DataFrame.apply () can be applied to multiple columns at once.
The pandas dataframe apply () function is used to apply a function along a particular axis of a dataframe. The following is the syntax: We pass the function to be applied and the axis along which to apply it as arguments. To apply the function to each column, pass 0 or 'index' to the axis parameter which is 0 by default.
When every value of the panda’s data structure needs to be manipulated or operated in some specific manner, then the pandas.apply () function can be used. The apply () method is used to apply some specific function to every value in the panda’s data structure.
For formulating the resultant series into a dataframe, every column in the lambda dataframe is passed into the apply function by using the iloc as a column reference. So the output returned will also be a column of values.
If I understood your question correctly this is one of the ways to get the behavior you want:
def test_i_think_this_is_what_you_asked(mocker):
original_apply = pd.DataFrame.apply
def mocked_apply(self, *args, **kw):
assert len(self) == 2 # self is the pd.DataFrame at the time apply is called
assert self.a[0] == 1
assert self.a[1] == 3 # this will fail cause the value is 2
assert self.b[0] == 2.0
assert self.b[1] == 4.0
return original_apply(self, *args, **kw)
mocker.patch.object(pd.DataFrame, 'apply', side_effect=mocked_apply, autospec=True)
main()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With