Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas resample dealing with missing data

Tags:

python

pandas

I am using pandas to deal with monthly data that have some missing value. I would like to be able to use the resample method to compute annual statistics but for years with no missing data.

Here is some code and output to demonstrate :

import pandas as pd
import numpy as np
dates = pd.date_range(start = '1980-01', periods = 24,freq='M')
df = pd.DataFrame( [np.nan] * 10 + range(14), index = dates)

Here is what I obtain if I resample :

In [18]: df.resample('A')
Out[18]: 
          0
1980-12-31  0.5
1981-12-31  7.5

I would like to have a np.nan for the 1980-12-31 index since that year does not have monthly values for every month. I tried to play with the 'how' argument but to no luck.

How can I accomplish this?

like image 783
sbiner Avatar asked Oct 31 '22 22:10

sbiner


1 Answers

i'm sure there's a better way, but in this case you can use:

df.resample('A', how=[np.mean, pd.Series.count, len])

and then drop all rows where count != len

like image 50
acushner Avatar answered Nov 15 '22 05:11

acushner