Why does Pandas' DataFrame.apply
method call the function being applied when the DataFrame
is empty?
For example:
>>> import pandas as pd
>>> df = pd.DataFrame({"foo": []})
>>> df
Empty DataFrame
Columns: [foo]
Index: []
>>> x = []
>>> df.apply(x.append, axis=1)
Series([], dtype: float64)
>>> x
[Series([], dtype: float64)] # <<< why was the apply callback called with an empty row?
Digging into the Pandas source, it looks like this is the culprit:
if not all(self.shape):
# How to determine this better?
is_reduction = False
try:
is_reduction = not isinstance(f(_EMPTY_SERIES), Series)
except Exception:
pass
if is_reduction:
return Series(NA, index=self._get_agg_axis(axis))
else:
return self.copy()
It looks like Pandas is calling the function with no arguments in an attempt to guess whether the result should be a Series
or a DataFrame
.
I suppose a patch is in order.
Edit: this issue has been patched, and is now both documented and allows the reduce
option to be used to avoid it: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With