pandas DataFrame resample from irregular timeseries index

Tags:

I want to resample a DataFrame to every five seconds, where the time stamps of the original data are irregular. Apologies if this looks like a duplicate question, but I have issues with the interpolation lining up to the timestamps of the data, which is why I include my DataFrame in this question. The graph in this answer shows my desired results, but I cannot use the traces package suggested there. I use pandas 0.19.0.

Consider the following climb path of an aircraft (as dict on pastebin):

    Altitude        Time
1       0.00     0.00000
2    1000.00    16.45350
3    2000.00    33.19584
4    3000.00    50.25330
5    4000.00    67.64580
6    5000.00    85.38720
7    6000.00   103.56720
8    7000.00   122.29260
9    8000.00   141.61440
10   9000.00   161.59140
11   9999.67   182.27940
12  10000.30   182.33940
13  10000.30   199.76880
14  10000.30   199.82880
15  11000.00   221.67660
16  12000.00   244.36260
17  13000.00   267.93900
18  14000.00   292.46940
19  15000.00   318.01080
20  16000.00   344.36820
21  17000.00   371.32200
22  18000.00   398.91420
23  19000.00   427.19100
24  20000.00   456.24900
25  21000.00   486.38940
26  22000.00   517.91640
27  23000.00   550.96140
28  24000.00   585.65460
29  25000.00   622.12800
30  26000.00   660.35400
31  27000.00   700.37400
32  28000.00   742.39200
33  29000.00   786.57600
34  30000.00   833.13000
35  31000.00   882.09000
36  32000.00   933.46200
37  33000.00   987.40800
38  34000.00  1044.06000
39  35000.00  1103.85000
40  36000.00  1167.52200
41  36088.90  1173.39000
42  36089.60  1173.45000
43  36671.70  1216.60200
44  36672.40  1216.66200
45  38000.00  1295.80200
46  39000.00  1368.45000
47  40000.00  1458.00000
48  41000.00  1574.08200
49  42000.00  1730.97000
50  42231.00  1775.19600

Tried solutions

First, I have tried resampling while keeping the original index intact, as shown in this question, so I could then linearly interpolate, but I found no method of interpolation that produces correct results (note the original time column that only matches at 16.45s):

df = df.set_index(pd.to_datetime(df['Time'], unit='s'), drop=False)
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate().iloc[:6]

                                 Time  Altitude
1970-01-01 00:00:00.000000   0.000000       0.0
1970-01-01 00:00:05.000000   4.113375     250.0
1970-01-01 00:00:10.000000   8.226750     500.0
1970-01-01 00:00:15.000000  12.340125     750.0
1970-01-01 00:00:16.453500  16.453500    1000.0
1970-01-01 00:00:20.000000  20.639085    1250.0

Second, I tried resampling without keeping the original index, first down to 1s and then up to 5s, as shown in this answer, but the interpolation values do not line up at the end of the data, nor do the altitude values (1000ft should be between 15 and 20 seconds). Just resampling to 1s already produces wrong results.

df.resample('1s').interpolate(method='linear').resample('5s').asfreq()

                       Time      Altitude
1970-01-01 00:00:00     0.0      0.000000
1970-01-01 00:00:05     5.0    137.174211
1970-01-01 00:00:10    10.0    274.348422
1970-01-01 00:00:15    15.0    411.522634
1970-01-01 00:00:20    20.0    548.696845
1970-01-01 00:00:25    25.0    685.871056
1970-01-01 00:00:30    30.0    823.045267
1970-01-01 00:00:35    35.0    960.219479
1970-01-01 00:00:40    40.0   1097.393690
1970-01-01 00:00:45    45.0   1234.567901
1970-01-01 00:00:50    50.0   1371.742112
1970-01-01 00:00:55    55.0   1508.916324
1970-01-01 00:01:00    60.0   1646.090535
1970-01-01 00:01:05    65.0   1783.264746
1970-01-01 00:01:10    70.0   1920.438957
1970-01-01 00:01:15    75.0   2057.613169
1970-01-01 00:01:20    80.0   2194.787380
1970-01-01 00:01:25    85.0   2331.961591
1970-01-01 00:01:30    90.0   2469.135802
1970-01-01 00:01:35    95.0   2606.310014
1970-01-01 00:01:40   100.0   2743.484225
1970-01-01 00:01:45   105.0   2880.658436
1970-01-01 00:01:50   110.0   3017.832647
1970-01-01 00:01:55   115.0   3155.006859
1970-01-01 00:02:00   120.0   3292.181070
1970-01-01 00:02:05   125.0   3429.355281
1970-01-01 00:02:10   130.0   3566.529492
1970-01-01 00:02:15   135.0   3703.703704
1970-01-01 00:02:20   140.0   3840.877915
1970-01-01 00:02:25   145.0   3978.052126
...                     ...           ...
1970-01-01 00:27:10  1458.0  40000.000000
1970-01-01 00:27:15  1458.0  40000.000000
1970-01-01 00:27:20  1458.0  40000.000000
1970-01-01 00:27:25  1458.0  40000.000000
1970-01-01 00:27:30  1458.0  40000.000000
1970-01-01 00:27:35  1458.0  40000.000000
1970-01-01 00:27:40  1458.0  40000.000000
1970-01-01 00:27:45  1458.0  40000.000000
1970-01-01 00:27:50  1458.0  40000.000000
1970-01-01 00:27:55  1458.0  40000.000000
1970-01-01 00:28:00  1458.0  40000.000000
1970-01-01 00:28:05  1458.0  40000.000000
1970-01-01 00:28:10  1458.0  40000.000000
1970-01-01 00:28:15  1458.0  40000.000000
1970-01-01 00:28:20  1458.0  40000.000000
1970-01-01 00:28:25  1458.0  40000.000000
1970-01-01 00:28:30  1458.0  40000.000000
1970-01-01 00:28:35  1458.0  40000.000000
1970-01-01 00:28:40  1458.0  40000.000000
1970-01-01 00:28:45  1458.0  40000.000000
1970-01-01 00:28:50  1458.0  40000.000000
1970-01-01 00:28:55  1458.0  40000.000000
1970-01-01 00:29:00  1458.0  40000.000000
1970-01-01 00:29:05  1458.0  40000.000000
1970-01-01 00:29:10  1458.0  40000.000000
1970-01-01 00:29:15  1458.0  40000.000000
1970-01-01 00:29:20  1458.0  40000.000000
1970-01-01 00:29:25  1458.0  40000.000000
1970-01-01 00:29:30  1458.0  40000.000000
1970-01-01 00:29:35  1458.0  40000.000000

The Question

How can I go about resampling the original data to 5s while performing a correct interpolation? Am I just using the wrong interpolation method?

214

asked Mar 09 '18 10:03

Alarik

1 Answers

After some help from @Martin Schmelzer (thanks!) I found the first suggested method from the question to be working, when applying time as the method parameter for pandas' interpolation method:

resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate('time').iloc[:6]

                               Altitude     Time
1970-01-01 00:00:00.000000     0.000000   0.0000
1970-01-01 00:00:05.000000   303.886711   5.0000
1970-01-01 00:00:10.000000   607.773422  10.0000
1970-01-01 00:00:15.000000   911.660133  15.0000
1970-01-01 00:00:16.453500  1000.000000  16.4535
1970-01-01 00:00:20.000000  1211.828215  20.0000

I can then resample this to 5s or whatever and the results are exact.

df.combine_first(dummy_frame).interpolate('time').resample('5s').asfreq().head()
                        Altitude  Time
1970-01-01 00:00:00     0.000000   0.0
1970-01-01 00:00:05   303.886711   5.0
1970-01-01 00:00:10   607.773422  10.0
1970-01-01 00:00:15   911.660133  15.0
1970-01-01 00:00:20  1211.828215  20.0

So in the end it turns out I was just using the wrong interpolation method after all.

143

answered Nov 11 '22 14:11

Alarik

Related questions
                            
                                Iterable object and Django StreamingHttpResponse
                            
                                Plane-plane intersection in python [closed]
                            
                                How to implement a custom layer wit multiple outputs in Keras?
                            
                                How to have limited ZMQ (ZeroMQ - PyZMQ) queue buffer size in python?
                            
                                Generating n binary vectors where each vector has a Hamming distance of d from every other vector
                            
                                AWS Elastic Beanstalk failed to install Python package using requirements.txt Git Pip
                            
                                How to run python code line by line in Spyder and include loop/if statement contents
                            
                                How do you serialize a union field in Avro using Python when attributes match
                            
                                How to make a Parameter available to all Luigi Tasks?
                            
                                pytorch variable index lost one dimension
                            
                                Fill oceans in basemap [duplicate]
                            
                                how to make a https request in python 3
                            
                                Python remove hashtag symbol and keep key words
                            
                                pandas shift rows NaNs
                            
                                Plot multiple lines with holoviews
                            
                                how to replace pixel data on same dicom file using pydicom to read it again with any dicom viewer?
                            
                                Is there a way to export Allure Report to a single html file? To share with the team
                            
                                Dask prints warning to use client.scatter althought I'm using the suggested approach
                            
                                Limit Python function scope to local variables only
                            
                                Memory Error in Python 3 and Windows 64

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas DataFrame resample from irregular timeseries index

Tags:

python

datetime

pandas

time-series

Tried solutions

The Question

Alarik

People also ask

1 Answers

Alarik

Recent Activity

Donate For Us