Python pandas time series interpolation and regularization

Tags:

I am using Python Pandas for the first time. I have 5-min lag traffic data in csv format:

...
2015-01-04 08:29:05,271238
2015-01-04 08:34:05,329285
2015-01-04 08:39:05,-1
2015-01-04 08:44:05,260260
2015-01-04 08:49:05,263711
...

There are several issues:

for some timestamps there's missing data (-1)
missing entries (also 2/3 consecutive hours)
the frequency of the observations is not exactly 5 minutes, but actually loses some seconds once in a while

I would like to obtain a regular time series, so with entries every (exactly) 5 minutes (and no missing valus). I have successfully interpolated the time series with the following code to approximate the -1 values with this code:

ts = pd.TimeSeries(values, index=timestamps)
ts.interpolate(method='cubic', downcast='infer')

How can I both interpolate and regularize the frequency of the observations? Thank you all for the help.

932

asked May 29 '15 12:05

riccamini

1 Answers

Change the -1s to NaNs:

ts[ts==-1] = np.nan

Then resample the data to have a 5 minute frequency.

ts = ts.resample('5T')

Note that, by default, if two measurements fall within the same 5 minute period, resample averages the values together.

Finally, you could linearly interpolate the time series according to the time:

ts = ts.interpolate(method='time')

Since it looks like your data already has roughly a 5-minute frequency, you might need to resample at a shorter frequency so cubic or spline interpolation can smooth out the curve:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

values = [271238, 329285, -1, 260260, 263711]
timestamps = pd.to_datetime(['2015-01-04 08:29:05',
                             '2015-01-04 08:34:05',
                             '2015-01-04 08:39:05',
                             '2015-01-04 08:44:05',
                             '2015-01-04 08:49:05'])

ts = pd.Series(values, index=timestamps)
ts[ts==-1] = np.nan
ts = ts.resample('T').mean()

ts.interpolate(method='spline', order=3).plot()
ts.interpolate(method='time').plot()
lines, labels = plt.gca().get_legend_handles_labels()
labels = ['spline', 'time']
plt.legend(lines, labels, loc='best')
plt.show()

enter image description here

108

answered Sep 22 '22 13:09

unutbu

Related questions
                            
                                Pandas missing values : fill with the closest non NaN value
                            
                                python - unittest - ImportError: Start directory is not importable
                            
                                How to handle date variable in machine learning data pre-processing
                            
                                Eager Execution - InternalError: Could not find valid device for node name: "Sqrt"
                            
                                Django vs. Pylons
                            
                                How can you programmatically inspect the stack trace of an exception in Python?
                            
                                Calculate next scheduled time based on cron spec
                            
                                How to use find_element_by_link_text() properly to not raise NoSuchElementException?
                            
                                Flask route with URI encoded component
                            
                                Mocking - How do I raise exception on the caller?
                            
                                Does python officially support reusing a loop-variable after the loop?
                            
                                What is Python Whitespace and how does it work?
                            
                                Using NOT EXISTS clause in sqlalchemy ORM query
                            
                                How do numpy's in-place operations (e.g. `+=`) work?
                            
                                python total_ordering : why __lt__ and __eq__ instead of __le__?
                            
                                Why a calling function in python contains variable equal to value?
                            
                                how to handle 302 redirect in scrapy
                            
                                Dictionary Iterating -- for dict vs for dict.items()
                            
                                specifying "skip NA" when calculating mean of the column in a data frame created by Pandas
                            
                                Python asyncio debugging example

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python pandas time series interpolation and regularization

Tags:

python

pandas

time-series

interpolation

regularized

riccamini

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us