Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

While resampling, put NaN in the resulting value if there are some NaN values in the source interval

Example:

import pandas as pd
import numpy as np

rng = pd.date_range("2000-01-01", periods=12, freq="T")
ts = pd.Series(np.arange(12), index=rng)
ts["2000-01-01 00:02"] = np.nan
ts
2000-01-01 00:00:00     0.0
2000-01-01 00:01:00     1.0
2000-01-01 00:02:00     NaN
2000-01-01 00:03:00     3.0
2000-01-01 00:04:00     4.0
2000-01-01 00:05:00     5.0
2000-01-01 00:06:00     6.0
2000-01-01 00:07:00     7.0
2000-01-01 00:08:00     8.0
2000-01-01 00:09:00     9.0
2000-01-01 00:10:00    10.0
2000-01-01 00:11:00    11.0
Freq: T, dtype: float64
ts.resample("5min").sum()
2000-01-01 00:00:00     5.0
2000-01-01 00:05:00    30.0
2000-01-01 00:10:00    30.0
Freq: 5T, dtype: float64

In the above example, it extracts the sum of the interval 00:00-00:05 as if the missing value was zero. What I want is for it to produce result NaN in 00:00.

Or, maybe I'd like for it to be OK if there's one missing value in the interval, but NaN if there are two missing values in the interval.

How can I do these?

like image 614
Antonis Christofides Avatar asked Jan 27 '23 21:01

Antonis Christofides


1 Answers

For one or more NaN values:

ts.resample('5min').agg(pd.Series.sum, skipna=False)

For a minimum of 2 non-NaN values:

ts.resample('5min').agg(pd.Series.sum, min_count=2)

For a maximum of 2 NaN values seems tricker:

ts.resample('5min').apply(lambda x: x.sum() if x.isnull().sum() <= 2 else np.nan)

You might expect ts.resample('5min').sum(skipna=False) to work in the same way as ts.sum(skipna=False), but the implementations are not consistent.

like image 51
jpp Avatar answered Feb 12 '23 21:02

jpp