From this question I know how to interpolate time series with given timestamps. I am wondering how to interpolate timestamps with given values such as the following example to get the estimated NaT
value.
interval datetime
0.782296 2012-11-19 12:40:10
0.795469 NaT
0.821426 2012-11-19 12:35:10
0.834957 NaT
0.864383 2012-11-19 12:30:10
0.906240 2012-11-19 12:25:10
P.S. I have tried to use df['datetime'].interpolate()
directly but failed.
This seems to work. It's probably possible to clean up the code a bit. But you get the gist of it
from datetime import datetime
import pandas as pd
import time
#Create data
df = pd.DataFrame({ 'interval' : [0.782296, 0.795469, 0.821426, 0.834957,
0.864383, 0.906240],
'datetime' : [datetime(2012, 11, 19, 12, 40, 10), pd.NaT,
datetime(2012, 11, 19, 12, 35, 10), pd.NaT,
datetime(2012, 11, 19, 12, 30, 10),
datetime(2012, 11, 19, 12, 25, 10)
]})
#Cast date to seconds (also recast the NaT to Nan)
df['seconds'] = [time.mktime(t.timetuple()) if t is not pd.NaT else float('nan') for t in df['datetime'] ]
#Set the interval as the index, as interpolation uses the index
df.set_index('interval', inplace=True)
#Use the 'values'-argument to actually use the values of the index and not the spacing
df['intepolated'] = df['seconds'].interpolate('values')
#Cast the interpolated seconds back to datetime
df['datetime2'] = [datetime.utcfromtimestamp(t) for t in df['intepolated']]
#Clean up
df.reset_index(inplace=True)
df = df[['interval', 'datetime2']]
>>>>df
Out[25]:
interval datetime2
0 0.782296 2012-11-19 11:40:10.000000
1 0.795469 2012-11-19 11:38:29.005878
2 0.821426 2012-11-19 11:35:10.000000
3 0.834957 2012-11-19 11:33:35.503178
4 0.864383 2012-11-19 11:30:10.000000
5 0.906240 2012-11-19 11:25:10.000000
Hope this is what you wanted.
This is very easy with RedBlackPy. It is available for macosx and linux for python 3.
import redblackpy as rb
from datetime import datetime
# do not keep Nat values, it is redundantly
# you can interpolate data with no add keys to container
index = [0.782296, 0.821426, 0.864383, 0.906240]
values = [datetime(2012, 11, 19, 12, 40, 10),
datetime(2012, 11, 19, 12, 35, 10),
datetime(2012, 11, 19, 12, 30, 10),
datetime(2012, 11, 19, 12, 25, 10) ]
# init Series with specific interpolation type (floor, ceil, nn, linear)
data = rb.Series(index=index, values=values, dtype='object',
interpolation='linear')
Now you can access by any key using interpolation!
# your index, where you wanted to interpolate
int_index = [0.795469, 0.834957]
# access to key that not in series
print( data[int_index[0]] ) # this prints 2012-11-19 12:38:29.005878
# you change interpolation type
data.set_interpolation('floor')
print( data[int_index[0]] ) # this prints 2012-11-19 12:40:10
If you want to add interpolated values to Series just use insert or setitem as follows:
# this add interpolation values to data
for el in int_index:
data[el] = data[el]
print(data)
As the latest interpolation was 'floor' the result of print(data):
Series object Untitled
0.782296: 2012-11-19 12:40:10
0.795469: 2012-11-19 12:40:10
0.821426: 2012-11-19 12:35:10
0.834957: 2012-11-19 12:35:10
0.864383: 2012-11-19 12:30:10
0.90624: 2012-11-19 12:25:10
Short variant of @montysporty answer:
>>> import time
>>> df.set_index('interval', inplace=True)
>>> df['datetime'].map(lambda x: time.mktime(pd.datetime.timetuple(x)) if not pd.isna(x) else np.nan)\
.interpolate('values')\
.map(pd.datetime.utcfromtimestamp)
interval
0.782296 2012-11-19 08:40:10.000000
0.795469 2012-11-19 08:38:29.005878
0.821426 2012-11-19 08:35:10.000000
0.834957 2012-11-19 08:33:35.503178
0.864383 2012-11-19 08:30:10.000000
0.906240 2012-11-19 08:25:10.000000
Name: datetime, dtype: datetime64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With