Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to GroupBy a Dataframe in Pandas and keep Columns

Tags:

python

pandas

given a dataframe that logs uses of some books like this:

Name   Type   ID Book1  ebook  1 Book2  paper  2 Book3  paper  3 Book1  ebook  1 Book2  paper  2 

I need to get the count of all the books, keeping the other columns and get this:

Name   Type   ID    Count Book1  ebook  1     2 Book2  paper  2     2 Book3  paper  3     1 

How can this be done?

Thanks!

like image 706
Adrian Ribao Avatar asked Jul 22 '15 17:07

Adrian Ribao


People also ask

Can you use Groupby with multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

Does Groupby maintain order pandas?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.


2 Answers

You want the following:

In [20]: df.groupby(['Name','Type','ID']).count().reset_index()  Out[20]:     Name   Type  ID  Count 0  Book1  ebook   1      2 1  Book2  paper   2      2 2  Book3  paper   3      1 

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

In [25]: df['Count'] = df.groupby(['Name'])['ID'].transform('count') df.drop_duplicates()  Out[25]:     Name   Type  ID  Count 0  Book1  ebook   1      2 1  Book2  paper   2      2 2  Book3  paper   3      1 
like image 164
EdChum Avatar answered Sep 29 '22 13:09

EdChum


I think as_index=False should do the trick.

df.groupby(['Name','Type','ID'], as_index=False).count() 
like image 27
jpobst Avatar answered Sep 29 '22 12:09

jpobst