Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling date gaps in pandas dataframe

I have Pandas DataFrame (loaded from .csv) with Date-time as index.. where there is/have-to-be one entry per day. The problem is that I have gaps i.e. there is days for which I have no data at all. What is the easiest way to insert rows (days) in the gaps ? Also is there a way to control what is inserted in the columns as data ! Say 0 OR copy the prev day info OR to fill sliding increasing/decreasing values in the range from prev-date toward next-date data-values.

thanks

Here is example 01-03 and 01-04 are missing :

In [60]: df['2015-01-06':'2015-01-01']
Out[60]: 
           Rate  High (est)  Low (est)
Date                                      
2015-01-06  1.19643      0.0000     0.0000
2015-01-05  1.20368      1.2186     1.1889
2015-01-02  1.21163      1.2254     1.1980
2015-01-01  1.21469      1.2282     1.2014

Still experimenting but this seems to solve the problem :

df.set_index(pd.DatetimeIndex(df.Date),inplace=True)

and then resample... the reason being that importing the .csv with header-col-name Date, is not actually creating date-time-index, but Frozen-list whatever that means. resample() is expecting : if isinstance(ax, DatetimeIndex): .....


Here is my final solution :

  #make dates the index
  self.df.set_index(pd.DatetimeIndex(self.df.Date), inplace=True)
  #fill the gaps
  self.df = self.df.resample('D',fill_method='pad')
  #fix the Date column
  self.df.Date = self.df.index.values

I had to fix the Date column, because resample() just allow you to pad-it. It fixes the index correctly though, so I could use it to fix the Date column.

Here is snipped of the data after correction :

2015-01-29 2015-01-29  1.13262      0.0000     0.0000
2015-01-30 2015-01-30  1.13161      1.1450     1.1184
2015-01-31 2015-01-31  1.13161      1.1450     1.1184
2015-02-01 2015-02-01  1.13161      1.1450     1.1184

01-30, 01-31 are the new generated data.

like image 735
sten Avatar asked Feb 11 '15 23:02

sten


1 Answers

You'll could resample by day e.g. using mean if there are multiple entries per day:

df.resample('D', how='mean')

You can then ffill to replace NaNs with the previous days result.

See up and down sampling in the docs.

like image 70
Andy Hayden Avatar answered Oct 13 '22 00:10

Andy Hayden