Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python splines or other interpolations that work with time on x-axis?

Trying to use the awfully useful pandas to deal with data as time series, I am now stumbling over the fact that there do not seem to exist libraries that can directly interpolate (with a spline or similar) over data that has DateTime as an x-axis? I always seem to be forced to convert first to some floating point number, like seconds since 1980 or something like that.

I was trying the following things so far, sorry for the weird formatting, I have this stuff only in the ipython notebook, and I can't copy cells from there:

from scipy.interpolate import InterpolatedUnivariateSpline as IUS
type(bb2temp): pandas.core.series.TimeSeries
s = IUS(bb2temp.index.to_pydatetime(), bb2temp, k=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-67-19c6b8883073> in <module>()
----> 1 s = IUS(bb2temp.index.to_pydatetime(), bb2temp, k=1)

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py in __init__(self, x, y, w, bbox, k)
    335         #_data == x,y,w,xb,xe,k,s,n,t,c,fp,fpint,nrdata,ier
    336         self._data = dfitpack.fpcurf0(x,y,k,w=w,
--> 337                                       xb=bbox[0],xe=bbox[1],s=0)
    338         self._reset_class()
    339 

TypeError: float() argument must be a string or a number

By using bb2temp.index.values (that look like these:

array([1970-01-15 184:00:35.884999, 1970-01-15 184:00:58.668999,
       1970-01-15 184:01:22.989999, 1970-01-15 184:01:45.774000,
       1970-01-15 184:02:10.095000, 1970-01-15 184:02:32.878999,
       1970-01-15 184:02:57.200000, 1970-01-15 184:03:19.984000,

) as x-argument, interestingly, the Spline class does create an interpolator, but it still breaks when trying to interpolate/extrapolate to a larger DateTimeIndex (which is my final goal here). Here is how that looks:

all_times = divcal.timed.index.levels[2] # part of a MultiIndex

all_times
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-07-20 00:00:00.045000, ..., 2009-07-20 00:30:00.018000]
Length: 14063, Freq: None, Timezone: None

s(all_times.values) # applying the above generated interpolator
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-74-ff11f6d6d7da> in <module>()
----> 1 s(tall.values)

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py in __call__(self, x, nu)
    219 #            return dfitpack.splev(*(self._eval_args+(x,)))
    220 #        return dfitpack.splder(nu=nu,*(self._eval_args+(x,)))
--> 221         return fitpack.splev(x, self._eval_args, der=nu)
    222 
    223     def get_knots(self):

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/scipy/interpolate/fitpack.py in splev(x, tck, der, ext)
    546 
    547         x = myasarray(x)
--> 548         y, ier =_fitpack._spl_(x, der, t, c, k, ext)
    549         if ier == 10:
    550             raise ValueError("Invalid input data")

TypeError: array cannot be safely cast to required type

I tried to use s(all_times) and s(all_times.to_pydatetime()) as well, with the same TypeError: array cannot be safely cast to required type.

Am I, sadly, correct? Did everybody get used to convert times to floating points so much, that nobody thought it's a good idea that these interpolations should work automatically? (I would finally have found a super-useful project to contribute..) Or would you like to prove me wrong and earn some SO points? ;)

Edit: Warning: Check your pandas data for NaNs before you hand it to the interpolation routines. They will not complain about anything but just silently fail.

like image 882
K.-Michael Aye Avatar asked Dec 18 '12 21:12

K.-Michael Aye


1 Answers

The problem is that those fitpack routines that are used underneath require floats. So, at some point there has to be a conversion from datetime to floats. This conversion is easy. If bb2temp.index.values is your datetime array, just do:

In [1]: bb2temp.index.values.astype('d')
Out[1]: 
array([  1.22403588e+12,   1.22405867e+12,   1.22408299e+12,
         1.22410577e+12,   1.22413010e+12,   1.22415288e+12,
         1.22417720e+12,   1.22419998e+12])

You just need to pass that to your spline. And to convert the results back to datetime objects, you do results.astype('datetime64').

like image 114
tiago Avatar answered Oct 13 '22 00:10

tiago