Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to groupby based on two columns in pandas?

A similar question might have been asked before, but I couldn't find the exact one fitting to my problem. I want to group by a dataframe based on two columns. For exmaple to make this

id product quantity
1  A       2
1  A       3
1  B       2
2  A       1
2  B       1
3  B       2
3  B       1

Into this:

id product quantity
1  A       5
1  B       2
2  A       1
2  B       1
3  B       3

Meaning that summation on "quantity" column for same "id" and same "product".

like image 267
ARASH Avatar asked Apr 05 '17 04:04

ARASH


People also ask

Can you group by 2 columns in pandas?

Grouping by Multiple Columns You can do this by passing a list of column names to groupby instead of a single string value.

Can I group by 2 columns?

Usage of Group By Multiple ColumnsAll the records with the same values for the respective columns mentioned in the grouping criteria can be grouped as a single column using the group by multiple-column technique. The group by multiple columns is used to get summarized data from a database's table(s).

How do you group a pandas data frame by multiple columns in Python?

Grouping DataFrame with Index Levels and Columns A DataFrame may be grouped by a combination of columns and index levels by specifying the column names as strings and the index levels as pd. Grouper objects. The following example groups df by the second index level and the A column.

How do I sort by two columns in pandas?

You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending.


2 Answers

You need groupby with parameter as_index=False for return DataFrame and aggregating mean:

df = df.groupby(['id','product'], as_index=False)['quantity'].sum()
print (df)
   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3

Or add reset_index:

df = df.groupby(['id','product'])['quantity'].sum().reset_index()
print (df)
   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3
like image 199
jezrael Avatar answered Oct 05 '22 12:10

jezrael


You can use pivot_table with aggfunc='sum'

df.pivot_table('quantity', ['id', 'product'], aggfunc='sum').reset_index()

   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3
like image 21
piRSquared Avatar answered Oct 05 '22 14:10

piRSquared