Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resampling timestamps with pandas: Why are Mondays counted for a wrong week?

Tags:

python

pandas

I'm having a hard time trying to take my first steps with pandas. I would like to create a bar diagram showing how often something has happened a week. I want to identify the week by the first day of the week, which is a Monday in my case. Also, I want to ensure that the last week displayed is always the current week, even if nothing has happened this week.

import datetime
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator

# create some example data frame
timestamps = [
    "2021-01-08 11:21:14",
    "2021-02-15 08:04:46",
    "2021-02-18 16:49:39",
    "2021-02-24 11:59:39",
    "2021-03-03 08:29:39",
]
df = pd.DataFrame(dict(timestamp=timestamps))
df.timestamp = df.timestamp.astype('datetime64')

# ensure that this week is contained
df = df.append(dict(timestamp=datetime.datetime.now()), ignore_index=True)

# process data to histogram
# TODO: Mondays are counted as the week before
histogram_df = df.resample('W-MON', label='left', on='timestamp').count()

# remove fake entry which I added to ensure that the current week appears
histogram_df['timestamp'][-1] -= 1

# plot the data
ax = histogram_df.plot(y='timestamp', legend=False, kind='bar', rot=0, title='number')
ax.set_xlabel('')
ax.set_xticklabels(map(lambda t: t.strftime('KW %V\n%d.%m.%Y'), histogram_df.index))
ax.yaxis.set_major_locator(MultipleLocator(1))

plt.show()

This does almost what I want but Mondays (see February 15th) are counted for the week before. Why is that? How can I get Mondays to be counted for the week they are in?

The documentation of resample does not say what str values it's first argument called rule accepts or what they mean. 'W-MON' is mentioned here but without much explanation.

My initial understanding was 'W-MON' means "weekly with weeks starting on Mondays" and label='left' means "take the first day of the week instead of the last day". But that has proven to be wrong. So what do 'W-MON' and label='left' really mean?

Bonus question: Appending a row to ensure that the current week appears in the diagram and then decrementing the last count is not exactly safe in case a value from the future were to appear in the data. Is there a better way to do this?

like image 430
jakun Avatar asked Oct 19 '25 15:10

jakun


1 Answers

Try using also closed='left' in the df.resample() call, like below:

histogram_df = df.resample('W-MON', label='left', closed='left', on='timestamp').count()

From the doc on the parm closed (extracted below), the default is right for frequency 'W':

closed{‘right’, ‘left’}, default None

Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

like image 75
SeaBean Avatar answered Oct 22 '25 05:10

SeaBean



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!