Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resampling a pandas DataFrame with loffset introduces an additional offset of an hour

Tags:

python

pandas

I have a DataField containing an DatetimeIndex (with irregular intervals and time zone information) and two value columns:

In:  df.head()
Out: 
                                      v1    v2
2014-01-18 00:00:00.842537+01:00  130107  7958
2014-01-18 00:00:00.858443+01:00  130251  7958
2014-01-18 00:00:00.874054+01:00  130476  7958
2014-01-18 00:00:00.889617+01:00  130250  7958
2014-01-18 00:00:00.905163+01:00  130327  7958

In:  df.index
Out:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-01-18 00:00:00.842537984, ..., 2014-01-18 00:10:00.829031936]
Length: 38558, Freq: None, Timezone: Europe/Berlin

If I resample this DataField by any frequency, the timezone is kept:

In : df_3.resample('1S', 'mean',).head()
Out: 
                                      v1           v2
2014-01-18 00:00:00+01:00  130311.090909  7958.000000
2014-01-18 00:00:01+01:00  130385.125000  7958.000000
2014-01-18 00:00:02+01:00  130332.593750  7957.000000
2014-01-18 00:00:03+01:00  130377.061538  7957.307692
2014-01-18 00:00:04+01:00  130384.171875  7957.640625

When introducing any loffset, the timestamps are offset by an additional negative hour:

In : df_3.resample('1S', 'mean', loffset='1S').head()
Out: 
                                      v1           v2
2014-01-17 23:00:01+01:00  130311.090909  7958.000000
2014-01-17 23:00:02+01:00  130385.125000  7958.000000
2014-01-17 23:00:03+01:00  130332.593750  7957.000000
2014-01-17 23:00:04+01:00  130377.061538  7957.307692
2014-01-17 23:00:05+01:00  130384.171875  7957.640625

Even when specially giving an "empty" offset:

In : df_3.resample('1S', 'mean', loffset='0S').head()
Out: 
                                      v1           v2
2014-01-17 23:00:01+01:00  130311.090909  7958.000000
2014-01-17 23:00:02+01:00  130385.125000  7958.000000
2014-01-17 23:00:03+01:00  130332.593750  7957.000000
2014-01-17 23:00:04+01:00  130377.061538  7957.307692
2014-01-17 23:00:05+01:00  130384.171875  7957.640625

To keep the correct timestamps, I have to add this hour to the offset:

In : df_3.resample('1S', 'mean', loffset='1H1S').head()
Out: 
                                      v1           v2
2014-01-18 00:00:01+01:00  130311.090909  7958.000000
2014-01-18 00:00:02+01:00  130385.125000  7958.000000
2014-01-18 00:00:03+01:00  130332.593750  7957.000000
2014-01-18 00:00:04+01:00  130377.061538  7957.307692
2014-01-18 00:00:05+01:00  130384.171875  7957.640625

Why is this happening? Am I missing something?

like image 372
Julius Bullinger Avatar asked Nov 10 '22 14:11

Julius Bullinger


1 Answers

To answer my own question since it's still visited frequently: It was actually a bug that has been fixed in version 0.16.

like image 92
Julius Bullinger Avatar answered Nov 14 '22 23:11

Julius Bullinger