I want to resample a DataFrame to every five seconds, where the time stamps of the original data are irregular. Apologies if this looks like a duplicate question, but I have issues with the interpolation lining up to the timestamps of the data, which is why I include my DataFrame in this question. The graph in this answer shows my desired results, but I cannot use the traces
package suggested there. I use pandas 0.19.0
.
Consider the following climb path of an aircraft (as dict on pastebin):
Altitude Time
1 0.00 0.00000
2 1000.00 16.45350
3 2000.00 33.19584
4 3000.00 50.25330
5 4000.00 67.64580
6 5000.00 85.38720
7 6000.00 103.56720
8 7000.00 122.29260
9 8000.00 141.61440
10 9000.00 161.59140
11 9999.67 182.27940
12 10000.30 182.33940
13 10000.30 199.76880
14 10000.30 199.82880
15 11000.00 221.67660
16 12000.00 244.36260
17 13000.00 267.93900
18 14000.00 292.46940
19 15000.00 318.01080
20 16000.00 344.36820
21 17000.00 371.32200
22 18000.00 398.91420
23 19000.00 427.19100
24 20000.00 456.24900
25 21000.00 486.38940
26 22000.00 517.91640
27 23000.00 550.96140
28 24000.00 585.65460
29 25000.00 622.12800
30 26000.00 660.35400
31 27000.00 700.37400
32 28000.00 742.39200
33 29000.00 786.57600
34 30000.00 833.13000
35 31000.00 882.09000
36 32000.00 933.46200
37 33000.00 987.40800
38 34000.00 1044.06000
39 35000.00 1103.85000
40 36000.00 1167.52200
41 36088.90 1173.39000
42 36089.60 1173.45000
43 36671.70 1216.60200
44 36672.40 1216.66200
45 38000.00 1295.80200
46 39000.00 1368.45000
47 40000.00 1458.00000
48 41000.00 1574.08200
49 42000.00 1730.97000
50 42231.00 1775.19600
First, I have tried resampling while keeping the original index intact, as shown in this question, so I could then linearly interpolate, but I found no method of interpolation that produces correct results (note the original time column that only matches at 16.45s):
df = df.set_index(pd.to_datetime(df['Time'], unit='s'), drop=False)
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate().iloc[:6]
Time Altitude
1970-01-01 00:00:00.000000 0.000000 0.0
1970-01-01 00:00:05.000000 4.113375 250.0
1970-01-01 00:00:10.000000 8.226750 500.0
1970-01-01 00:00:15.000000 12.340125 750.0
1970-01-01 00:00:16.453500 16.453500 1000.0
1970-01-01 00:00:20.000000 20.639085 1250.0
Second, I tried resampling without keeping the original index, first down to 1s and then up to 5s, as shown in this answer, but the interpolation values do not line up at the end of the data, nor do the altitude values (1000ft should be between 15 and 20 seconds). Just resampling to 1s already produces wrong results.
df.resample('1s').interpolate(method='linear').resample('5s').asfreq()
Time Altitude
1970-01-01 00:00:00 0.0 0.000000
1970-01-01 00:00:05 5.0 137.174211
1970-01-01 00:00:10 10.0 274.348422
1970-01-01 00:00:15 15.0 411.522634
1970-01-01 00:00:20 20.0 548.696845
1970-01-01 00:00:25 25.0 685.871056
1970-01-01 00:00:30 30.0 823.045267
1970-01-01 00:00:35 35.0 960.219479
1970-01-01 00:00:40 40.0 1097.393690
1970-01-01 00:00:45 45.0 1234.567901
1970-01-01 00:00:50 50.0 1371.742112
1970-01-01 00:00:55 55.0 1508.916324
1970-01-01 00:01:00 60.0 1646.090535
1970-01-01 00:01:05 65.0 1783.264746
1970-01-01 00:01:10 70.0 1920.438957
1970-01-01 00:01:15 75.0 2057.613169
1970-01-01 00:01:20 80.0 2194.787380
1970-01-01 00:01:25 85.0 2331.961591
1970-01-01 00:01:30 90.0 2469.135802
1970-01-01 00:01:35 95.0 2606.310014
1970-01-01 00:01:40 100.0 2743.484225
1970-01-01 00:01:45 105.0 2880.658436
1970-01-01 00:01:50 110.0 3017.832647
1970-01-01 00:01:55 115.0 3155.006859
1970-01-01 00:02:00 120.0 3292.181070
1970-01-01 00:02:05 125.0 3429.355281
1970-01-01 00:02:10 130.0 3566.529492
1970-01-01 00:02:15 135.0 3703.703704
1970-01-01 00:02:20 140.0 3840.877915
1970-01-01 00:02:25 145.0 3978.052126
... ... ...
1970-01-01 00:27:10 1458.0 40000.000000
1970-01-01 00:27:15 1458.0 40000.000000
1970-01-01 00:27:20 1458.0 40000.000000
1970-01-01 00:27:25 1458.0 40000.000000
1970-01-01 00:27:30 1458.0 40000.000000
1970-01-01 00:27:35 1458.0 40000.000000
1970-01-01 00:27:40 1458.0 40000.000000
1970-01-01 00:27:45 1458.0 40000.000000
1970-01-01 00:27:50 1458.0 40000.000000
1970-01-01 00:27:55 1458.0 40000.000000
1970-01-01 00:28:00 1458.0 40000.000000
1970-01-01 00:28:05 1458.0 40000.000000
1970-01-01 00:28:10 1458.0 40000.000000
1970-01-01 00:28:15 1458.0 40000.000000
1970-01-01 00:28:20 1458.0 40000.000000
1970-01-01 00:28:25 1458.0 40000.000000
1970-01-01 00:28:30 1458.0 40000.000000
1970-01-01 00:28:35 1458.0 40000.000000
1970-01-01 00:28:40 1458.0 40000.000000
1970-01-01 00:28:45 1458.0 40000.000000
1970-01-01 00:28:50 1458.0 40000.000000
1970-01-01 00:28:55 1458.0 40000.000000
1970-01-01 00:29:00 1458.0 40000.000000
1970-01-01 00:29:05 1458.0 40000.000000
1970-01-01 00:29:10 1458.0 40000.000000
1970-01-01 00:29:15 1458.0 40000.000000
1970-01-01 00:29:20 1458.0 40000.000000
1970-01-01 00:29:25 1458.0 40000.000000
1970-01-01 00:29:30 1458.0 40000.000000
1970-01-01 00:29:35 1458.0 40000.000000
How can I go about resampling the original data to 5s while performing a correct interpolation? Am I just using the wrong interpolation method?
Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
There are 2 prerequisites to carry out resampling: The Datetime column must be in “datetime” or “timestamp” data type. The Datetime column must be the index. Alternatively, you can use the ' on ' parameter to define the column to carry out resampling (not supported in upsampling).
Resample Hourly Data to Daily Data To simplify your plot which has a lot of data points due to the hourly records, you can aggregate the data for each day using the . resample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them.
Resampling involves changing the frequency of your time series observations. Two types of resampling are: Upsampling: Where you increase the frequency of the samples, such as from minutes to seconds. Downsampling: Where you decrease the frequency of the samples, such as from days to months.
After some help from @Martin Schmelzer (thanks!) I found the first suggested method from the question to be working, when applying time
as the method
parameter for pandas' interpolation method:
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate('time').iloc[:6]
Altitude Time
1970-01-01 00:00:00.000000 0.000000 0.0000
1970-01-01 00:00:05.000000 303.886711 5.0000
1970-01-01 00:00:10.000000 607.773422 10.0000
1970-01-01 00:00:15.000000 911.660133 15.0000
1970-01-01 00:00:16.453500 1000.000000 16.4535
1970-01-01 00:00:20.000000 1211.828215 20.0000
I can then resample this to 5s or whatever and the results are exact.
df.combine_first(dummy_frame).interpolate('time').resample('5s').asfreq().head()
Altitude Time
1970-01-01 00:00:00 0.000000 0.0
1970-01-01 00:00:05 303.886711 5.0
1970-01-01 00:00:10 607.773422 10.0
1970-01-01 00:00:15 911.660133 15.0
1970-01-01 00:00:20 1211.828215 20.0
So in the end it turns out I was just using the wrong interpolation method after all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With