Interpolate and fill pandas dataframe with datetime index

Tags:

python

pandas

Hi I'm trying to interpolate a Dataframe where I have a datetimeIndex index.

Here's the data

res = pd.DataFrame(cursor.execute("SELECT DATETIME,VALUE FROM {} WHERE DATETIME > ? AND DATETIME < ?".format(table),[start,end]).fetchall(),columns=['date','value'])
res.set_index('date',inplace=True)

which produces

2013-01-31 00:00:00   517  
2012-12-31 00:00:00   263  
2012-11-30 00:00:00  1917  
2012-10-31 00:00:00   391  
2012-09-30 00:00:00   782  
2012-08-31 00:00:00   700  
2012-07-31 00:00:00   799  
2012-06-30 00:00:00   914  
2012-05-31 00:00:00   141  
2012-04-30 00:00:00   342  
2012-03-31 00:00:00   199  
2012-02-29 00:00:00   533  
2012-01-31 00:00:00  1393  
2011-12-31 00:00:00   497  
2011-11-30 00:00:00  1457  
2011-10-31 00:00:00   997  
2011-09-30 00:00:00   533  
2011-08-31 00:00:00   626  
2011-07-31 00:00:00  1933  
2011-06-30 00:00:00  4248  
2011-05-31 00:00:00  1248  
2011-04-30 00:00:00   904  
2011-03-31 00:00:00  3280  
2011-02-28 00:00:00   390  
2011-01-31 00:00:00   601  
2010-12-31 00:00:00   423  
2010-11-30 00:00:00   748  
2010-10-31 00:00:00   433  
2010-09-30 00:00:00   734  
2010-08-31 00:00:00   845  
2010-07-31 00:00:00  1693  
2010-06-30 00:00:00  2742  
2010-05-31 00:00:00   669

This is all non contiguous. I want to have a daily value so, want to fill in the missing values using some kind of interpolation.

First tried to set the index and then interpolate.

new_index = pd.date_range(date(2010,1,1),date(2014,1,31),freq='D')
df2 = res.reindex(new_index) # This returns NaN
df2.interpolate('cubic') # Fails with error TypeError: Cannot interpolate with all NaNs.

What I would hope to get back is a dataframe with each date value between 2010-2014, with a interpolated value calculated from the points surrounding it.

It seems like there probably is a way to do this simply, but I'm not sure what.

710

asked May 05 '15 14:05

Delta_Fore

2 Answers

Here's one way to do it.

First get a new index from max min of df.index dates

In [152]: df_reindexed = df.reindex(pd.date_range(start=df.index.min(),
                                                  end=df.index.max(),
                                                  freq='1D'))

Then use interpolate(method='linear') on the series to get values.

In [153]: df_reindexed.interpolate(method='linear')                                                                      
Out[153]:                                                                                                                
                  Value                                                                                                  
2010-05-31   669.000000                                                                                                  
2010-06-01   738.100000                                                                                                  
2010-06-02   807.200000                                                                                                  
2010-06-03   876.300000                                                                                                  
2010-06-04   945.400000                                                                                                  
2010-06-05  1014.500000                                                                                                  
...                                                                                                  
2013-01-25   467.838710                                                                                                  
2013-01-26   476.032258                                                                                                  
2013-01-27   484.225806                                                                                                  
2013-01-28   492.419355                                                                                                  
2013-01-29   500.612903                                                                                                  
2013-01-30   508.806452                                                                                                  
2013-01-31   517.000000                                                                                                  

[977 rows x 1 columns]

178

answered Oct 11 '22 01:10

Zero

Just as an add on to @JohnGalt's answer, you could also use resample which is slightly more convenient than reindex here:

df.resample('D').interpolate('cubic')

                  value
date                   
2010-05-31   669.000000
2010-06-01   830.400272
2010-06-02   983.988431
2010-06-03  1129.919466
2010-06-04  1268.348368
2010-06-05  1399.430127
2010-06-06  1523.319734

...

2010-06-25  2716.850752
2010-06-26  2729.445324
2010-06-27  2738.102544
2010-06-28  2742.977403
2010-06-29  2744.224892
2010-06-30  2742.000000
2010-07-01  2736.454249
2010-07-02  2727.725284
2010-07-03  2715.947277

answered Oct 11 '22 02:10

JohnE

Related questions
                            
                                What is reference stealing and borrowing ?
                            
                                Flask route using path with leading slash
                            
                                Spyder Plot Inline
                            
                                Python: Index an array using the colon operator in an arbitrary dimension
                            
                                Generator vs Sequence object
                            
                                Can we have Flask error handlers in separate module
                            
                                How to struct unpack c null terminated string?
                            
                                Beautiful Soup 4: How to replace a tag with text and another tag?
                            
                                Handling "Authentication Required" alert box with Python 2.7 + Selenium Webdriver
                            
                                Why is DataFrame.loc[[1]] 1,800x slower than df.ix [[1]] and 3,500x than df.loc[1]?
                            
                                Django : Static content not found
                            
                                nosetests framework: how to pass environment variables to my tests?
                            
                                SimpleHTTPServer launched as a thread: does not daemonize
                            
                                Python Sympy Pretty Output of Matrix
                            
                                IdeaVim plugin in pycharm doesn't support continuous scroll for long press?
                            
                                Extracting coefficients from GLM in Python using statsmodel
                            
                                How do I suppress the console window when debugging python code in Python Tools for Visual Studio (PTVS)?
                            
                                python selenium - Element is not currently interactable and may not be manipulated
                            
                                Combine Pandas data frame column values into new column
                            
                                Using __future__ style imports for module specific features in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With