Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a pivot table in pandas and grouping at the same time the dates per week

I want to create a pd.pivot_table in python where one column is a datetime object, but I want also, to group my results on a weekly basis. Here's a simple example: I have the following DataFrame:

import pandas as pd

names = ['a', 'b', 'c', 'd'] * 7
dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14', '2017-01-11', '2017-01-06', '2017-01-05', '2017-01-08', '2017-01-10', '2017-01-07', '2017-01-04', '2017-01-02', '2017-01-04', '2017-01-01', '2017-01-12']
dates = [pd.to_datetime(i).date() for i in dates]
numbers = [4, 3, 2, 1 ] * 7
data = {'name': names , 'date': dates, 'number': numbers}

df = pd.DataFrame(data)

which yields:

          date name  number
0   2017-01-11    a       4
1   2017-01-08    b       3
2   2017-01-14    c       2
3   2017-01-05    d       1
4   2017-01-10    a       4
5   2017-01-13    b       3
6   2017-01-02    c       2
7   2017-01-12    d       1
8   2017-01-10    a       4
9   2017-01-05    b       3
10  2017-01-01    c       2
11  2017-01-04    d       1
12  2017-01-11    a       4
13  2017-01-14    b       3
14  2017-01-05    c       2
15  2017-01-06    d       1
16  2017-01-14    a       4
17  2017-01-11    b       3
18  2017-01-06    c       2
19  2017-01-05    d       1
20  2017-01-08    a       4
21  2017-01-10    b       3
22  2017-01-07    c       2
23  2017-01-04    d       1
24  2017-01-02    a       4
25  2017-01-04    b       3
26  2017-01-01    c       2
27  2017-01-12    d       1

I want to create a pivot table where the rows are going to be the names, the columns are going to be the dates on a weekly basis and the numbers are going to be the sum of the number column. For example, the first row of the pivot table will be:

2017-01-01 2017-01-08 2017-01-15 ... a 4 24 0

What I am doing is:

pd.pivot_table(data=df, values='number', columns=pd.Grouper(key='date', freq='1W'), index='name', aggfunc=sum)

but I get the Error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'.

How am I supposed to do that? I don't know if I can use the date as an index, since all the date values are not unique.

like image 497
thanasissdr Avatar asked Dec 17 '17 23:12

thanasissdr


People also ask

Why is my pivot table not grouping dates?

When you try to Group this Data, you will see that Excel Pivot Table not grouping dates and will display this Cannot group that selection error. Now, to fix this you can simply use the filter button to find the cells containing incorrect format or text.

Why is pivot table not grouping dates by month?

Automatic Date/Time Grouping Option Go to File > Options in Excel to open the Excel Options Window. Click the Data tab in the left sidebar. If you are using an older version of Excel this is on the Advanced tab. Check the “Disable automatic grouping of Date/Time columns in PivotTables” checkbox.


2 Answers

IIUC:

first make sure that the date column is of datetime dtype:

df['date'] = pd.to_datetime(df['date'], errors='coerce')

then you can group, sum and unstack:

In [289]: (df.groupby(['name', pd.Grouper(freq='W', key='date')])
             ['number']
             .sum()
             .unstack(fill_value=0))
Out[289]:
date  2017-01-01  2017-01-08  2017-01-15
name
a              0           8          20
b              0           9          12
c              4           8           2
d              0           5           2

or as proposed by @thanasissdr:

In [328]: (df.groupby(['name', pd.Grouper(freq='W', key='date', closed='left')])
             ['number']
             .sum()
             .unstack(fill_value=0))
Out[328]:
date  2017-01-08  2017-01-15
name
a              4          24
b              6          15
c             12           2
d              5           2

or

In [330]: (df.assign(date=df['date']-pd.offsets.Day(7))
     ...:    .groupby(['name', pd.Grouper(freq='W', key='date', closed='left')])
     ...:    ['number']
     ...:    .sum()
     ...:    .unstack(fill_value=0))
     ...:
Out[330]:
date  2017-01-01  2017-01-08
name
a              4          24
b              6          15
c             12           2
d              5           2
like image 142
MaxU - stop WAR against UA Avatar answered Sep 23 '22 11:09

MaxU - stop WAR against UA


Continuing with my logic, we can create a multi-index, where the date is part of the index. So we can have:

import pandas as pd

names = ['a', 'b', 'c', 'd'] * 7
dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14', '2017-01-11', '2017-01-06', '2017-01-05', '2017-01-08', '2017-01-10', '2017-01-07', '2017-01-04', '2017-01-02', '2017-01-04', '2017-01-01', '2017-01-12']
dates = [pd.to_datetime(i).date() for i in dates]
numbers = [4, 3, 2, 1 ] * 7
data = {'name': names , 'date': dates, 'number': numbers}

df = pd.DataFrame(data)

df.set_index([df.index, df.date], inplace=True)

print pd.pivot_table(data=df, columns=pd.Grouper(freq='7d', level='date', closed='left') , index='name', aggfunc=sum)

which yields exactly:

         number           
date 2017-01-01 2017-01-08
name                      
a             4         24
b             6         15
c            12          2
d             5          2
like image 32
thanasissdr Avatar answered Sep 22 '22 11:09

thanasissdr