For certain Pandas functions, such as sum(), cumsum() and cumprod(), there is an option for skipna which is set to True by default. This causes issues for me as errors might silently propagate so I always explicitly set skipna to False.
sum_df = df.sum(skipna=False)
Doing it every time one of these functions appear makes the code look a bit unwieldy. Is there a way I can change the default behaviour in Pandas?
It seems there is nothing such an option to control this behaviour. It is hard coded:
import inspect
inspect.getfile(pd.DataFrame.sum) # './pandas/core/generic.py'
inspect.getsource(pd.DataFrame.sum)
# @Substitution(outname=name, desc=desc, name1=name1, name2=name2,
# axis_descr=axis_descr, min_count=_min_count_stub,
# see_also=see_also, examples=examples)
# @Appender(_num_doc)
# def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None,
# [...]
It could be a good idea for pull request.
Probably not the best solution, it is a bit hackish but it does address your problem.
I am not saying that it is a good practice in general. It may have drawbacks that I have not addressed (you are welcome to list it in comment). Anyway this solution has the advantage to be non intrusive.
Additionally, although it is a quite simple technique and it is pure PSL, it may violate Principle Of Least Astonishment (see this answer for details).
Lets build a wrapper that overrides existing default parameters or add extra parameters:
def set_default(func, **default):
def inner(*args, **kwargs):
kwargs.update(default) # Update function kwargs w/ decorator defaults
return func(*args, **kwargs) # Call function w/ updated kwargs
return inner # Return decorated function
Then, we can decorate any function. For instance:
import pandas as pd
pd.DataFrame.sum = set_default(pd.DataFrame.sum, skipna=False)
Then, the sum
method of DataFrame
object has its skipna
overridden to False
each time we call it. Now the following code:
import numpy as np
df = pd.DataFrame([1., 2., np.nan])
df.sum()
Returns:
0 NaN
dtype: float64
Instead of:
0 3.0
dtype: float64
We can apply this modification to many functions, at once:
for key in ['sum', 'mean', 'std']:
setattr(pd.DataFrame, key, set_default(getattr(pd.DataFrame, key), skipna=False))
If we store those modifications into a python module (.py
file) they will be applied at the import time without having the need to modify the Pandas code itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With