I want to create a pd.pivot_table
in python
where one column is a datetime
object, but I want also, to group my results on a weekly basis. Here's a simple example: I have the following DataFrame
:
import pandas as pd
names = ['a', 'b', 'c', 'd'] * 7
dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14', '2017-01-11', '2017-01-06', '2017-01-05', '2017-01-08', '2017-01-10', '2017-01-07', '2017-01-04', '2017-01-02', '2017-01-04', '2017-01-01', '2017-01-12']
dates = [pd.to_datetime(i).date() for i in dates]
numbers = [4, 3, 2, 1 ] * 7
data = {'name': names , 'date': dates, 'number': numbers}
df = pd.DataFrame(data)
which yields:
date name number
0 2017-01-11 a 4
1 2017-01-08 b 3
2 2017-01-14 c 2
3 2017-01-05 d 1
4 2017-01-10 a 4
5 2017-01-13 b 3
6 2017-01-02 c 2
7 2017-01-12 d 1
8 2017-01-10 a 4
9 2017-01-05 b 3
10 2017-01-01 c 2
11 2017-01-04 d 1
12 2017-01-11 a 4
13 2017-01-14 b 3
14 2017-01-05 c 2
15 2017-01-06 d 1
16 2017-01-14 a 4
17 2017-01-11 b 3
18 2017-01-06 c 2
19 2017-01-05 d 1
20 2017-01-08 a 4
21 2017-01-10 b 3
22 2017-01-07 c 2
23 2017-01-04 d 1
24 2017-01-02 a 4
25 2017-01-04 b 3
26 2017-01-01 c 2
27 2017-01-12 d 1
I want to create a pivot table where the rows are going to be the names, the columns are going to be the dates on a weekly basis and the numbers are going to be the sum of the number column. For example, the first row of the pivot table will be:
2017-01-01 2017-01-08 2017-01-15 ...
a 4 24 0
What I am doing is:
pd.pivot_table(data=df, values='number', columns=pd.Grouper(key='date', freq='1W'), index='name', aggfunc=sum)
but I get the Error:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'.
How am I supposed to do that? I don't know if I can use the date as an index, since all the date values are not unique.
When you try to Group this Data, you will see that Excel Pivot Table not grouping dates and will display this Cannot group that selection error. Now, to fix this you can simply use the filter button to find the cells containing incorrect format or text.
Automatic Date/Time Grouping Option Go to File > Options in Excel to open the Excel Options Window. Click the Data tab in the left sidebar. If you are using an older version of Excel this is on the Advanced tab. Check the “Disable automatic grouping of Date/Time columns in PivotTables” checkbox.
IIUC:
first make sure that the date
column is of datetime
dtype:
df['date'] = pd.to_datetime(df['date'], errors='coerce')
then you can group, sum and unstack:
In [289]: (df.groupby(['name', pd.Grouper(freq='W', key='date')])
['number']
.sum()
.unstack(fill_value=0))
Out[289]:
date 2017-01-01 2017-01-08 2017-01-15
name
a 0 8 20
b 0 9 12
c 4 8 2
d 0 5 2
or as proposed by @thanasissdr:
In [328]: (df.groupby(['name', pd.Grouper(freq='W', key='date', closed='left')])
['number']
.sum()
.unstack(fill_value=0))
Out[328]:
date 2017-01-08 2017-01-15
name
a 4 24
b 6 15
c 12 2
d 5 2
or
In [330]: (df.assign(date=df['date']-pd.offsets.Day(7))
...: .groupby(['name', pd.Grouper(freq='W', key='date', closed='left')])
...: ['number']
...: .sum()
...: .unstack(fill_value=0))
...:
Out[330]:
date 2017-01-01 2017-01-08
name
a 4 24
b 6 15
c 12 2
d 5 2
Continuing with my logic, we can create a multi-index, where the date is part of the index. So we can have:
import pandas as pd
names = ['a', 'b', 'c', 'd'] * 7
dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14', '2017-01-11', '2017-01-06', '2017-01-05', '2017-01-08', '2017-01-10', '2017-01-07', '2017-01-04', '2017-01-02', '2017-01-04', '2017-01-01', '2017-01-12']
dates = [pd.to_datetime(i).date() for i in dates]
numbers = [4, 3, 2, 1 ] * 7
data = {'name': names , 'date': dates, 'number': numbers}
df = pd.DataFrame(data)
df.set_index([df.index, df.date], inplace=True)
print pd.pivot_table(data=df, columns=pd.Grouper(freq='7d', level='date', closed='left') , index='name', aggfunc=sum)
which yields exactly:
number
date 2017-01-01 2017-01-08
name
a 4 24
b 6 15
c 12 2
d 5 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With