Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe group by multiple columns

Tags:

python

pandas

Given a dataframe with two datetime columns A and B and a numeric column C, how to group by month of both A and B and sum(C) i.e.

In [1]: df
Out[1]: 

      A           B            C  
0  2013-01-01  2013-01-01  0.282863 
1  2013-01-02  2013-01-01  0.173215 
2  2013-02-03  2013-02-04  2.104569 
3  2013-02-09  2013-04-15  0.706771 
4  2013-03-05  2013-08-01  0.567020 
5  2013-03-06  2013-04-01  0.113648
like image 701
DougKruger Avatar asked Sep 25 '17 15:09

DougKruger


Video Answer


2 Answers

By using groupby

df.groupby([df.A.dt.month,df.B.dt.month]).C.sum()

Out[954]: 
A  B
1  1    0.456078
2  2    2.104569
   4    0.706771
3  4    0.113648
   8    0.567020
Name: C, dtype: float64

Note: By using this , make sure A and B are datetime format If not , do following code before groupby

df.A=pd.to_datetime(df.A)
df.B=pd.to_datetime(df.B) 
like image 83
BENY Avatar answered Oct 19 '22 02:10

BENY


I recently just read about a new function that makes grouping by dates super easy.

 df.A=pd.to_datetime(df.A)
 df.B=pd.to_datetime(df.B)

 df.groupby([pd.Grouper(key='A', freq='M'), pd.Grouper(key='B', freq='M')])['C'].sum()

The number of options this opens up makes it worth looking into:

Source: http://pbpython.com/pandas-grouper-agg.html

Different Date aliases: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

like image 4
Yale Newman Avatar answered Oct 19 '22 01:10

Yale Newman