Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a 'count' column to the result of a groupby in pandas?

Tags:

python

pandas

I think this is a fairly basic question, but I can't seem to find the solution.

I have a pandas dataframe similar to the following:

import pandas as pd  df = pd.DataFrame({'A' : ['x','x','y','z','z'],                    'B' : ['p','p','q','r','r']}) df 

which creates a table like this:

    A   B 0   x   p 1   x   p 2   y   q 3   z   r 4   z   r 

I'm trying to create a table that represents the number of distinct values in that dataframe. So my goal is something like this:

    A   B   c 0   x   p   2 1   y   q   1 2   z   r   2 

I can't find the correct functions to achieve this, though. I've tried:

df.groupby(['A','B']).agg('count') 

This produces a table with 3 rows (as expected) but without a 'count' column. I don't know how to add in that count column. Could someone point me in the right direction?

like image 874
Oliver Avatar asked Feb 13 '18 15:02

Oliver


People also ask

How do you count after Groupby in pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.

How does pandas count sum and group by?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do you count occurrences in a column in pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

What is the difference between count and Value_counts in pandas?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.


2 Answers

You can using size

df.groupby(['A','B']).size() Out[590]:  A  B x  p    2 y  q    1 z  r    2 dtype: int64 

For your solution adding one of the columns

df.groupby(['A','B']).B.agg('count') Out[591]:  A  B x  p    2 y  q    1 z  r    2 Name: B, dtype: int64 

Update :

df.groupby(['A','B']).B.agg('count').to_frame('c').reset_index()  #df.groupby(['A','B']).size().to_frame('c').reset_index() Out[593]:     A  B  c 0  x  p  2 1  y  q  1 2  z  r  2 
like image 85
BENY Avatar answered Oct 17 '22 03:10

BENY


pandas >= 1.1: DataFrame.value_counts

This is an identical replacement for df.groupby(['A', 'B']).size().

df.value_counts(['A', 'B'])  A  B z  r    2 x  p    2 y  q    1 dtype: int64 
df.value_counts(['A', 'B']).reset_index(name='c')     A  B  c 0  z  r  2 1  x  p  2 2  y  q  1 
like image 35
cs95 Avatar answered Oct 17 '22 03:10

cs95