Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby week given a datetime column

Let's say I have the following data sample:

df = pd.DataFrame({'date':['2011-01-01','2011-01-02',
                       '2011-01-03','2011-01-04','2011-01-05',
                       '2011-01-06','2011-01-07','2011-01-08',
                       '2011-01-09','2011-12-30','2011-12-31'],
                   'revenue':[5,3,2,
                              10,12,2,
                              1,0,6,10,12]})

# Let's format the date and add the week number and year
df['date'] = pd.to_datetime(df['date'],format='%Y-%m-%d')
df['week_number'] = df['date'].dt.week
df['year'] = df['date'].dt.year

df

        date        revenue     week_of_year    year
0       2011-01-01  5           52              2011
1       2011-01-02  3           52              2011
2       2011-01-03  2           1               2011
3       2011-01-04  10          1               2011
4       2011-01-05  12          1               2011
5       2011-01-06  2           1               2011
6       2011-01-07  1           1               2011
7       2011-01-08  0           1               2011
8       2011-01-09  6           1               2011
9       2011-12-30  10          52              2011
10      2011-12-31  12          52              2011

I would like to compute the revenue per week, in order to later plot the results, and analyze the time series. The expected output would then be something like that :

    week    revenue
0   1       8
1   2       33
2   52      22

I first thought of using the week number given by timestamp.week.
However, I can't figure out how to deal with the ISO week number definition for the week preceeding week number 1. I am a bit confused, since grouping by week_number would in that case sum both the revenue at the very beginning of the year, and those at the end of the year.

like image 740
plalanne Avatar asked Jun 29 '18 15:06

plalanne


2 Answers

When you convert using dt.week , it is ISO week date.

You can using strftime

df.groupby(df.date.dt.strftime('%W')).revenue.sum()
Out[588]: 
date
00     8
01    33
52    22
Name: revenue, dtype: int64
like image 68
BENY Avatar answered Nov 18 '22 10:11

BENY


You could use date column as index and then resample the time series.

df.index = pd.to_datetime(df['date'])
df.resample('W').sum()

With that solution you don't even need the week and year column.

like image 38
Alex G Avatar answered Nov 18 '22 09:11

Alex G