When I resample a Pandas time series to reduce the number of data points, the timestamp of each resulting datapoint is at the start of each resampling bin. When overplotting graphs with different resampling rates, this causes an apparent shift of the data. How can I "center" the timestamp of the resampled data in its bin, whatever the resample rate?
What I get now is (when resampling to one hour):
In [12]: d_r.head()
Out[12]:
2017-01-01 00:00:00 0.330567
2017-01-01 01:00:00 0.846968
2017-01-01 02:00:00 0.965027
2017-01-01 03:00:00 0.629218
2017-01-01 04:00:00 -0.002522
Freq: H, dtype: float64
what I want is:
In [12]: d_r.head()
Out[12]:
2017-01-01 00:30:00 0.330567
2017-01-01 01:30:00 0.846968
2017-01-01 02:30:00 0.965027
2017-01-01 03:30:00 0.629218
2017-01-01 04:30:00 -0.002522
Freq: H, dtype: float64
MWE showing apprarent shift:
#!/usr/bin/env python3
Minimal working example:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import seaborn
seaborn.set()
plt.ion()
# sample data
t = pd.date_range('2017-01-01 00:00', '2017-01-01 10:00', freq='1min')
d = pd.Series(np.sin(np.linspace(0, 7, len(t))), index=t)
d_r = d.resample('1h').mean()
d.plot()
d_r.plot()
I don't know how to use the midpoint in general. There is the label
-parameter, but that only has the options right
and left
. However, in a concrete case as this you can explicitly offset the resampled timestamp with the loffset
-parameter:
d.resample('1h', loffset='30min').mean()
(edit: Use 30min
instead of 30T
as that is more readable: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases)
The loffset
keyword argument seems to be deprecated soon.
In my opinion, the nicest way to do it that I know of is the following:
d_r = d.shift(0.5, freq='1h').resample('1h').mean()
Compared to using the loffset
keyword this has the advantage that the resulting timestamps are at full hours.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With