Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas roll a week forward when using resample with W-MON frequency?

Tags:

python

pandas

As example, I have the following code which creates a dataframe with an index containing a single value - the date '2018-03-06' (a Tuesday). Note that this date falls in the week of 2018-03-05 (a Monday):

values = [1, 1, 1]
dates = pd.to_datetime(np.repeat('2018-03-06', 3))
df = pd.DataFrame({
    'value': values
}, index=dates)
df.resample('W-MON').size()

which produces:

2018-03-12    3
Freq: W-MON, dtype: int64

Why does pandas roll the date forward one week? I would have expected the result to have been resampled to 2018-03-05 since that is the week during which the values were generated and I'm using freq='W-MON'.

UPDATE

As was pointed out, I needed to add the label argument to resample which defines which bin edge to use. Using label='left' solves the problem of bucketing the dates in the correct week except when the date falls on the start of the week (in this case, Monday). For example, if I apply resample to the date 2018-03-05 using label='left' then the resampled value is 2018-02-26 when it should be 2018-03-05.

like image 280
cdlm Avatar asked Mar 28 '18 21:03

cdlm


People also ask

How do pandas manipulate dates?

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.

What does resample do in pandas?

Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. This is a very important technique in the field of analytics. There are many other types of time series frequency available.

How do you resample by day in Python?

Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .


2 Answers

Let's try using label and closed see docs:

values = [1, 1, 1]
dates = pd.to_datetime(np.repeat('2018-03-06', 3))
df = pd.DataFrame({
    'value': values
}, index=dates)
df.resample('W-MON', label='left',closed='left').size()

Output:

2018-03-05    3
Freq: W-MON, dtype: int64

And,

values = [1, 1, 1]
dates = pd.to_datetime(np.repeat('2018-03-05', 3))
df = pd.DataFrame({
    'value': values
}, index=dates)
df.resample('W-MON', label='left',closed='left').size()

Output:

2018-03-05    3
Freq: W-MON, dtype: int64

Interesting note about the docs, the signature states that 'closed' defaults to None. However, the docstring states that 'closed' default 'left'.

like image 114
Scott Boston Avatar answered Oct 28 '22 07:10

Scott Boston


I'm not sure why it's done this way and I agree that the behaviour you expected seems more intuitive. You can get your desired result by passing label='left' as a keyword parameter. The default value in this case was 'right'.

df.resample('W-MON', label='left').size()

From the documentation:

label : {‘right’, ‘left’}

Which bin edge label to label bucket with. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

I guess 'W-MON' still counts as 'W' which is why the default is 'right' and therefore your example gave a result of '2018-03-12' rather than '2018-03-05'.

like image 45
sjw Avatar answered Oct 28 '22 05:10

sjw