Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas time series interpolation datetime data

Tags:

python

pandas

From this question I know how to interpolate time series with given timestamps. I am wondering how to interpolate timestamps with given values such as the following example to get the estimated NaT value.

interval            datetime             
0.782296 2012-11-19 12:40:10
0.795469                 NaT
0.821426 2012-11-19 12:35:10
0.834957                 NaT
0.864383 2012-11-19 12:30:10
0.906240 2012-11-19 12:25:10

P.S. I have tried to use df['datetime'].interpolate() directly but failed.

like image 449
natsuapo Avatar asked Aug 24 '17 12:08

natsuapo


3 Answers

This seems to work. It's probably possible to clean up the code a bit. But you get the gist of it

from datetime import datetime
import pandas as pd
import time

#Create data
df = pd.DataFrame({ 'interval' : [0.782296, 0.795469, 0.821426, 0.834957, 
                                  0.864383, 0.906240],
                    'datetime' : [datetime(2012, 11, 19, 12, 40, 10), pd.NaT,
                                  datetime(2012, 11, 19, 12, 35, 10), pd.NaT,
                                  datetime(2012, 11, 19, 12, 30, 10),
                                  datetime(2012, 11, 19, 12, 25, 10)
                                  ]})


#Cast date to seconds (also recast the NaT to Nan)
df['seconds'] = [time.mktime(t.timetuple()) if t is not pd.NaT else float('nan') for t in df['datetime'] ]

#Set the interval as the index, as interpolation uses the index
df.set_index('interval', inplace=True)
#Use the 'values'-argument to actually use the values of the index and not the spacing
df['intepolated'] = df['seconds'].interpolate('values')
#Cast the interpolated seconds back to datetime
df['datetime2'] = [datetime.utcfromtimestamp(t) for t in df['intepolated']]

#Clean up
df.reset_index(inplace=True)
df = df[['interval', 'datetime2']]

>>>>df
Out[25]: 
   interval                  datetime2
0  0.782296 2012-11-19 11:40:10.000000
1  0.795469 2012-11-19 11:38:29.005878
2  0.821426 2012-11-19 11:35:10.000000
3  0.834957 2012-11-19 11:33:35.503178
4  0.864383 2012-11-19 11:30:10.000000
5  0.906240 2012-11-19 11:25:10.000000

Hope this is what you wanted.

like image 58
mortysporty Avatar answered Nov 07 '22 13:11

mortysporty


This is very easy with RedBlackPy. It is available for macosx and linux for python 3.

import redblackpy as rb
from datetime import datetime

# do not keep Nat values, it is redundantly
# you can interpolate data with no add keys to container
index = [0.782296, 0.821426, 0.864383, 0.906240]
values = [datetime(2012, 11, 19, 12, 40, 10),
          datetime(2012, 11, 19, 12, 35, 10),
          datetime(2012, 11, 19, 12, 30, 10),
          datetime(2012, 11, 19, 12, 25, 10) ]

# init Series with specific interpolation type (floor, ceil, nn, linear)
data = rb.Series(index=index, values=values, dtype='object',  
                 interpolation='linear')

Now you can access by any key using interpolation!

# your index, where you wanted to interpolate
int_index = [0.795469, 0.834957]
# access to key that not in series
print( data[int_index[0]] ) # this prints 2012-11-19 12:38:29.005878

# you change interpolation type
data.set_interpolation('floor')

print( data[int_index[0]] ) # this prints 2012-11-19 12:40:10

If you want to add interpolated values to Series just use insert or setitem as follows:

# this add interpolation values to data 
for el in int_index:
    data[el] = data[el]

print(data)

As the latest interpolation was 'floor' the result of print(data):

Series object Untitled
0.782296: 2012-11-19 12:40:10
0.795469: 2012-11-19 12:40:10
0.821426: 2012-11-19 12:35:10
0.834957: 2012-11-19 12:35:10
0.864383: 2012-11-19 12:30:10
0.90624: 2012-11-19 12:25:10
like image 1
Кирилл Солодских Avatar answered Nov 07 '22 11:11

Кирилл Солодских


Short variant of @montysporty answer:

>>> import time
>>> df.set_index('interval', inplace=True)
>>> df['datetime'].map(lambda x: time.mktime(pd.datetime.timetuple(x)) if not pd.isna(x) else np.nan)\
                  .interpolate('values')\
                  .map(pd.datetime.utcfromtimestamp)

interval
0.782296   2012-11-19 08:40:10.000000
0.795469   2012-11-19 08:38:29.005878
0.821426   2012-11-19 08:35:10.000000
0.834957   2012-11-19 08:33:35.503178
0.864383   2012-11-19 08:30:10.000000
0.906240   2012-11-19 08:25:10.000000
Name: datetime, dtype: datetime64[ns]
like image 1
Maksim Terpilowski Avatar answered Nov 07 '22 12:11

Maksim Terpilowski