Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Choose the starting day when resampling every 2 weeks

Tags:

python

pandas

Say I have the following time series, which starts on 2014-06-01 which is a Sunday.

In [7]:

# 2014-06-01 is Sunday
df = pd.Series( index=pd.date_range( '2014-06-01', periods=30 ), data=nr.randn( 30 ) ) #
df

I can resample weekly, starting on Sundays and closing on Saturdays:

In [9]:

df.resample( 'W-SAT' )
Out[9]:
2014-06-07    0.119460
2014-06-14    0.464789
2014-06-21   -1.211579
2014-06-28    0.650210
2014-07-05    0.666044
Freq: W-SAT, dtype: float64

Ok now I want to the same thing but every 2 weeks, so I try this:

In [11]:

df.resample( '2W-SAT' )
Out[11]:
2014-06-07    0.119460
2014-06-21   -0.373395
2014-07-05    0.653729
Freq: 2W-SAT, dtype: float64

Oh, the output is 1 week and then 2 weeks and 2 weeks. That's not what I expected. I was expecting the first index entry to be '2014-06-14'. Why is it doing that? How do I get the first 2 weeks to be resampled together?

like image 356
usual me Avatar asked Jun 13 '14 14:06

usual me


1 Answers

After trying the various options of resample, I might have an explanation. The way resample chooses the first entry of the new resampled index seems to depend on the closed option:

  • when closed=left, resample looks for the latest possible start
  • when closed=right, resample looks for the earliest possible start

I will illustrate with an example:

# 2014-06-01 is Sunday
df = pd.Series( index=pd.date_range( '2014-06-01', periods=30 ), data=range(1 , 31 ) ) #
df

The following example illustrates the behaviour of closed=left. The latest "left-side" Saturday of a 2 weeks interval closed on the left happens on 2014-05-31, as shown by the following:

df.resample( '2W-SAT',how='sum', closed='left', label='left' )
Out[119]:
2014-05-31     91
2014-06-14    287
2014-06-28     87
Freq: 2W-SAT, dtype: int64

The next example illustrates the behaviour of closed=right, which is the one that I didn't understand in my initial post (closed=right by default in resample). The earliest "right-side" Saturday of a 2 weeks interval closed on the right happens on 2014/06/07, as shown by the following:

df.resample( '2W-SAT',how='sum', closed='right', label='right' )
Out[122]:
2014-06-07     28
2014-06-21    203
2014-07-05    234
Freq: 2W-SAT, dtype: int64
like image 124
usual me Avatar answered Sep 29 '22 07:09

usual me