Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resample time series excluding nan data

I have a daily data time series in which there are many NaN values. I want to resample to monthly data taking account only months with less than 10 day NaN values.

I've tried using the resample function, by this way:

df = 
Date          Sr_1    Sr_2    Sr_3
01/12/1978    32.2    20.8    NaN
02/12/1978    32.2    20.6    NaN
03/12/1978    31.6    22      NaN
04/12/1978    28.2    19.4    NaN
05/12/1978    29.8    22.8    24.6
06/12/1978    32      22.2    25.8
07/12/1978    32.8    23.2    NaN
08/12/1978    29.8    NaN     26.8
09/12/1978    31.4    21.4    25.4
10/12/1978    28.8    24      NaN
11/12/1978    30.8    20      NaN
12/12/1978    32      24      25.6
13/12/1978    33      23.2    25.8
14/12/1978    32.4    22.4    24.6
15/12/1978    30      20.6    NaN
16/12/1978    32.6    21.2    NaN
17/12/1978    33      23.4    NaN
18/12/1978    30.4    20.4    26.4
19/12/1978    32      22.2    NaN
20/12/1978    32.2    NaN     NaN
21/12/1978    32.8    22.8    NaN
22/12/1978    32      22.2    NaN
23/12/1978    32.2    NaN     NaN
24/12/1978    31.4    NaN     NaN
25/12/1978    33      NaN     25.6
26/12/1978    33.4    20.6    NaN
27/12/1978    33.6    22.2    NaN
28/12/1978    33.6    23.4    NaN
29/12/1978    33.8    23.4    NaN
30/12/1978    33.2    NaN     25.2
31/12/1978    33.6    23.4    25.2
df.resample('1MS', how='mean')

The result is:

01/12/1978    31.9    22.1    25.5

But Sr_3 have more than 10 NaN values, so the result must be NaN.

Thanks

like image 401
anvelascos Avatar asked Mar 26 '26 19:03

anvelascos


1 Answers

Here's a hackyish way. First count the number of NaNs then use where to NaN those out.

In [11]: g = df1.groupby(pd.TimeGrouper('1MS'))

Note: count by using isnull and sum.

In [12]: g.apply(lambda x: pd.isnull(x).sum()).unstack(1)  # Note: columns match res
Out[12]:
            Sr_1  Sr_2  Sr_3
Date
1978-01-01     0     0     1
1978-02-01     0     0     1
1978-03-01     0     0     1
1978-04-01     0     0     1
1978-05-01     0     0     0
1978-06-01     0     0     0
1978-07-01     0     0     1
1978-08-01     0     1     0
1978-09-01     0     0     0
1978-10-01     0     0     1
1978-11-01     0     0     1
1978-12-01     0     5    13

In [13]: under_ten_nan = g.apply(lambda x: pd.isnull(x).sum()).unstack(1) <= 10

use where to NaN those entries over 10:

In [14]: res.where(under_ten_nan)
Out[14]:
             Sr_1   Sr_2  Sr_3
Date
1978-01-01  32.20  20.80   NaN
1978-02-01  32.20  20.60   NaN
1978-03-01  31.60  22.00   NaN
1978-04-01  28.20  19.40   NaN
1978-05-01  29.80  22.80  24.6
1978-06-01  32.00  22.20  25.8
1978-07-01  32.80  23.20   NaN
1978-08-01  29.80    NaN  26.8
1978-09-01  31.40  21.40  25.4
1978-10-01  28.80  24.00   NaN
1978-11-01  30.80  20.00   NaN
1978-12-01  32.51  22.36   NaN
like image 83
Andy Hayden Avatar answered Mar 28 '26 22:03

Andy Hayden



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!