Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Center datetimes of resampled time series

Tags:

python

pandas

When I resample a Pandas time series to reduce the number of data points, the timestamp of each resulting datapoint is at the start of each resampling bin. When overplotting graphs with different resampling rates, this causes an apparent shift of the data. How can I "center" the timestamp of the resampled data in its bin, whatever the resample rate?

What I get now is (when resampling to one hour):

In [12]: d_r.head()
Out[12]: 
2017-01-01 00:00:00    0.330567
2017-01-01 01:00:00    0.846968
2017-01-01 02:00:00    0.965027
2017-01-01 03:00:00    0.629218
2017-01-01 04:00:00   -0.002522
Freq: H, dtype: float64

what I want is:

In [12]: d_r.head()
Out[12]: 
2017-01-01 00:30:00    0.330567
2017-01-01 01:30:00    0.846968
2017-01-01 02:30:00    0.965027
2017-01-01 03:30:00    0.629218
2017-01-01 04:30:00   -0.002522
Freq: H, dtype: float64

MWE showing apprarent shift:

#!/usr/bin/env python3
Minimal working example:

import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import seaborn
seaborn.set()

plt.ion()

# sample data
t = pd.date_range('2017-01-01 00:00', '2017-01-01 10:00', freq='1min')
d = pd.Series(np.sin(np.linspace(0, 7, len(t))), index=t)


d_r = d.resample('1h').mean()

d.plot()
d_r.plot()

Apparent shift of resampled data

like image 392
Åsmund Avatar asked Nov 20 '17 15:11

Åsmund


2 Answers

I don't know how to use the midpoint in general. There is the label-parameter, but that only has the options right and left. However, in a concrete case as this you can explicitly offset the resampled timestamp with the loffset-parameter:

d.resample('1h', loffset='30min').mean()

(edit: Use 30min instead of 30T as that is more readable: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases)

like image 164
gahjelle Avatar answered Oct 18 '22 04:10

gahjelle


The loffset keyword argument seems to be deprecated soon.

In my opinion, the nicest way to do it that I know of is the following:

d_r = d.shift(0.5, freq='1h').resample('1h').mean()

Compared to using the loffset keyword this has the advantage that the resulting timestamps are at full hours.

like image 28
matthme Avatar answered Oct 18 '22 03:10

matthme