Pandas timeseries resampling and interpolating together

Tags:

I have timestamped sensor data. Because of technical details, I get data from the sensors at approximately one minute intervals. The data may look like this:

   tstamp               val
0  2016-09-01 00:00:00  57
1  2016-09-01 00:01:00  57
2  2016-09-01 00:02:23  57
3  2016-09-01 00:03:04  57
4  2016-09-01 00:03:58  58
5  2016-09-01 00:05:00  60

Now, essentially, I would be extremely happy if I got all data at the exact minute, but I don't. The only way to conserve the distribution and have data at each minute is to interpolate. For example, between row indexes 1 and 2 there are 83 seconds, and the natural way to get a value at the exact minute is to interpolate between the two rows of data (in this case, it is 57, but that is not the case everywhere).

Right now, my approach is to do the following:

date = pd.to_datetime(df['measurement_tstamp'].iloc[0].date())
ts_d = df['measurement_tstamp'].dt.hour * 60 * 60 +\
       df['measurement_tstamp'].dt.minute * 60 +\
       df['measurement_tstamp'].dt.second
ts_r = np.arange(0, 24*60*60, 60)
data = scipy.interpolate.interp1d(x=ts_d, y=df['speed'].values)(ts_r)
req = pd.Series(data, index=pd.to_timedelta(ts_r, unit='s'))
req.index = date + req.index

But this feels rather drawn out and long to me. There are excellent pandas methods that do resampling, rounding, etc. I have been reading them all day, but it turns out that nothing does interpolation just the way I want it. resample works like a groupby and averages time points that fall together. fillna does interpolation, but not after resample has already altered the data by averaging.

Am I missing something, or is my approach the best there is?

For simplicity, assume that I group the data by day, and by sensor, so only a 24 hour period from one sensor is interpolated at a time.

593

asked Oct 14 '16 02:10

Kartik

2 Answers

d = df.set_index('tstamp')
t = d.index
r = pd.date_range(t.min().date(), periods=24*60, freq='T')

d.reindex(t.union(r)).interpolate('index').ix[r]

enter image description here

Note, periods=24*60 works on daily data, not on the sample provided in the question. For that sample, periods=6 will work.

answered Oct 19 '22 05:10

piRSquared

5 years later, pandas has changed a bit (mostly the ix function is deprecated). Anyway, I've rewritten piRSquared's answer to work with current pandas versions and also improved on the date range issue that the answer had:

import pandas as pd
from datetime import datetime

df = pd.DataFrame({"tstamp": [
    datetime(2016, 9, 1, 0, 0, 0),
    datetime(2016, 9, 1, 0, 1, 0),
    datetime(2016, 9, 1, 0, 2, 23),
    datetime(2016, 9, 1, 0, 3, 4),
    datetime(2016, 9, 1, 0, 3, 58),
    datetime(2016, 9, 1, 0, 5, 0)], 
    "val": [57, 57, 57, 57, 58, 60]})


d = df.set_index('tstamp')
t = d.index

r = pd.date_range(t.min(), t.max(), freq='T')

d = d.reindex(t.union(r)).interpolate('index').loc[r]

                           val
2016-09-01 00:00:00  57.000000
2016-09-01 00:01:00  57.000000
2016-09-01 00:02:00  57.000000
2016-09-01 00:03:00  57.000000
2016-09-01 00:04:00  58.064516
2016-09-01 00:05:00  60.000000

answered Oct 19 '22 05:10

Recessive

Related questions
                            
                                Saving and Loading of dataframe to csv results in Unnamed columns
                            
                                Join two DataFrames on one key column / ERROR: 'columns overlap but no suffix specified'
                            
                                Reducing pandas series with multiple nan values to a set gives multiple nan values
                            
                                Pandas Pivot_Table : Percentage of row calculation for non-numeric values
                            
                                Slice pandas dataframe based on datetime column
                            
                                Combinations of MultiIndex levels which occur in a DataFrame
                            
                                Getting the date of the last day of this [week/month/quarter/year]
                            
                                pandas how to convert all the string value to float
                            
                                Pyspark .toPandas() results in object column where expected numeric one
                            
                                efficient concatenation of lists in pandas series
                            
                                pandas create one column equal to another if condition is satisfied
                            
                                Retrieving data from a yaml file based on a Python list
                            
                                Delete second row of header in PANDAS
                            
                                In Pandas, how to create a unique ID based on the combination of many columns?
                            
                                How to extract tuple values in pandas dataframe for use of matplotlib?
                            
                                Pandas - How to group and unstack on multiple variables?
                            
                                Python stemming (with pandas dataframe)
                            
                                Pandas .describe() only returning 4 statistics on int dataframe (count, unique, top, freq)... no min, max, etc
                            
                                Pandas dataframe pivot not fitting in memory
                            
                                Pandas find first nan value by rows and return column name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas timeseries resampling and interpolating together

Tags:

pandas

time-series

python-3.5

interpolation

Kartik

People also ask

2 Answers

piRSquared

Recessive

Recent Activity

Donate For Us