Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting pandas global default for skipna to False

For certain Pandas functions, such as sum(), cumsum() and cumprod(), there is an option for skipna which is set to True by default. This causes issues for me as errors might silently propagate so I always explicitly set skipna to False.

sum_df = df.sum(skipna=False)

Doing it every time one of these functions appear makes the code look a bit unwieldy. Is there a way I can change the default behaviour in Pandas?

like image 515
Spinor8 Avatar asked Oct 16 '22 15:10

Spinor8


1 Answers

Option is not an option (yet)

It seems there is nothing such an option to control this behaviour. It is hard coded:

import inspect
inspect.getfile(pd.DataFrame.sum)    # './pandas/core/generic.py'
inspect.getsource(pd.DataFrame.sum)

# @Substitution(outname=name, desc=desc, name1=name1, name2=name2,
#                  axis_descr=axis_descr, min_count=_min_count_stub,
#                  see_also=see_also, examples=examples)
# @Appender(_num_doc)
# def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None,
# [...]

It could be a good idea for pull request.

A simple solution

Probably not the best solution, it is a bit hackish but it does address your problem.

I am not saying that it is a good practice in general. It may have drawbacks that I have not addressed (you are welcome to list it in comment). Anyway this solution has the advantage to be non intrusive.

Additionally, although it is a quite simple technique and it is pure PSL, it may violate Principle Of Least Astonishment (see this answer for details).

MCVE

Lets build a wrapper that overrides existing default parameters or add extra parameters:

def set_default(func, **default):
    def inner(*args, **kwargs):
        kwargs.update(default)        # Update function kwargs w/ decorator defaults
        return func(*args, **kwargs)  # Call function w/ updated kwargs
    return inner                      # Return decorated function

Then, we can decorate any function. For instance:

import pandas as pd
pd.DataFrame.sum = set_default(pd.DataFrame.sum, skipna=False)

Then, the sum method of DataFrame object has its skipna overridden to False each time we call it. Now the following code:

import numpy as np
df = pd.DataFrame([1., 2., np.nan])
df.sum()

Returns:

0   NaN
dtype: float64

Instead of:

0    3.0
dtype: float64

Automation

We can apply this modification to many functions, at once:

for key in ['sum', 'mean', 'std']:
    setattr(pd.DataFrame, key, set_default(getattr(pd.DataFrame, key), skipna=False))

If we store those modifications into a python module (.py file) they will be applied at the import time without having the need to modify the Pandas code itself.

like image 67
jlandercy Avatar answered Nov 03 '22 01:11

jlandercy