Creating a pivot table in pandas and grouping at the same time the dates per week

Tags:

I want to create a pd.pivot_table in python where one column is a datetime object, but I want also, to group my results on a weekly basis. Here's a simple example: I have the following DataFrame:

import pandas as pd

names = ['a', 'b', 'c', 'd'] * 7
dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14', '2017-01-11', '2017-01-06', '2017-01-05', '2017-01-08', '2017-01-10', '2017-01-07', '2017-01-04', '2017-01-02', '2017-01-04', '2017-01-01', '2017-01-12']
dates = [pd.to_datetime(i).date() for i in dates]
numbers = [4, 3, 2, 1 ] * 7
data = {'name': names , 'date': dates, 'number': numbers}

df = pd.DataFrame(data)

which yields:

          date name  number
0   2017-01-11    a       4
1   2017-01-08    b       3
2   2017-01-14    c       2
3   2017-01-05    d       1
4   2017-01-10    a       4
5   2017-01-13    b       3
6   2017-01-02    c       2
7   2017-01-12    d       1
8   2017-01-10    a       4
9   2017-01-05    b       3
10  2017-01-01    c       2
11  2017-01-04    d       1
12  2017-01-11    a       4
13  2017-01-14    b       3
14  2017-01-05    c       2
15  2017-01-06    d       1
16  2017-01-14    a       4
17  2017-01-11    b       3
18  2017-01-06    c       2
19  2017-01-05    d       1
20  2017-01-08    a       4
21  2017-01-10    b       3
22  2017-01-07    c       2
23  2017-01-04    d       1
24  2017-01-02    a       4
25  2017-01-04    b       3
26  2017-01-01    c       2
27  2017-01-12    d       1

I want to create a pivot table where the rows are going to be the names, the columns are going to be the dates on a weekly basis and the numbers are going to be the sum of the number column. For example, the first row of the pivot table will be:

2017-01-01 2017-01-08 2017-01-15 ... a 4 24 0

What I am doing is:

pd.pivot_table(data=df, values='number', columns=pd.Grouper(key='date', freq='1W'), index='name', aggfunc=sum)

but I get the Error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'.

How am I supposed to do that? I don't know if I can use the date as an index, since all the date values are not unique.

497

asked Dec 17 '17 23:12

thanasissdr

2 Answers

IIUC:

first make sure that the date column is of datetime dtype:

df['date'] = pd.to_datetime(df['date'], errors='coerce')

then you can group, sum and unstack:

In [289]: (df.groupby(['name', pd.Grouper(freq='W', key='date')])
             ['number']
             .sum()
             .unstack(fill_value=0))
Out[289]:
date  2017-01-01  2017-01-08  2017-01-15
name
a              0           8          20
b              0           9          12
c              4           8           2
d              0           5           2

or as proposed by @thanasissdr:

In [328]: (df.groupby(['name', pd.Grouper(freq='W', key='date', closed='left')])
             ['number']
             .sum()
             .unstack(fill_value=0))
Out[328]:
date  2017-01-08  2017-01-15
name
a              4          24
b              6          15
c             12           2
d              5           2

In [330]: (df.assign(date=df['date']-pd.offsets.Day(7))
     ...:    .groupby(['name', pd.Grouper(freq='W', key='date', closed='left')])
     ...:    ['number']
     ...:    .sum()
     ...:    .unstack(fill_value=0))
     ...:
Out[330]:
date  2017-01-01  2017-01-08
name
a              4          24
b              6          15
c             12           2
d              5           2

142

answered Sep 23 '22 11:09

MaxU - stop WAR against UA

Continuing with my logic, we can create a multi-index, where the date is part of the index. So we can have:

import pandas as pd

names = ['a', 'b', 'c', 'd'] * 7
dates = ['2017-01-11', '2017-01-08', '2017-01-14', '2017-01-05', '2017-01-10', '2017-01-13', '2017-01-02', '2017-01-12', '2017-01-10', '2017-01-05', '2017-01-01', '2017-01-04', '2017-01-11', '2017-01-14', '2017-01-05', '2017-01-06', '2017-01-14', '2017-01-11', '2017-01-06', '2017-01-05', '2017-01-08', '2017-01-10', '2017-01-07', '2017-01-04', '2017-01-02', '2017-01-04', '2017-01-01', '2017-01-12']
dates = [pd.to_datetime(i).date() for i in dates]
numbers = [4, 3, 2, 1 ] * 7
data = {'name': names , 'date': dates, 'number': numbers}

df = pd.DataFrame(data)

df.set_index([df.index, df.date], inplace=True)

print pd.pivot_table(data=df, columns=pd.Grouper(freq='7d', level='date', closed='left') , index='name', aggfunc=sum)

which yields exactly:

         number           
date 2017-01-01 2017-01-08
name                      
a             4         24
b             6         15
c            12          2
d             5          2

answered Sep 22 '22 11:09

thanasissdr

Related questions
                            
                                Connecting to SQL server from SQLAlchemy using odbc_connect
                            
                                Django Form Request.GET only get 1 value from multiple selected field
                            
                                How to make a slice of DataFrame and "fillna" in specific slice using Python Pandas?
                            
                                Python Pandas: calculate rolling mean (moving average) over variable number of rows
                            
                                How can I use an animated gif that will play when pressed as a button in kivy?
                            
                                gcloud ml-engine returns error on large files
                            
                                Add element at the start of array and delete at the end numpy
                            
                                Pandas.plotting doesn't show graph
                            
                                Using a normal function f(x) in python
                            
                                Get the column names of a python numpy array
                            
                                Treating NaN as zero in arithmetic operations?
                            
                                python gobject.mainloop gobbles signal events
                            
                                Configuring ALE plugin with Pylint
                            
                                Is there a way I can log when python "requests-cache" hits the cache?
                            
                                Is there better way to create lazy variable initialization?
                            
                                Creating a Numpy structure scalar instead of array
                            
                                Fast way to quantize numpy vectors
                            
                                save pandas plot with subplots to one file
                            
                                Mixing asyncio and Kivy: How to start the asyncio loop and the Kivy application at the same time?
                            
                                Is there a way to tell which kernel a jupyter notebook was built with?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a pivot table in pandas and grouping at the same time the dates per week

Tags:

python

pandas

pandas-groupby

thanasissdr

People also ask

2 Answers

MaxU - stop WAR against UA

thanasissdr

Recent Activity

Donate For Us