Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a conditional count after groupby on a Pandas Dataframe?

I have the following dataframe:

   key1  key2 0    a   one 1    a   two 2    b   one 3    b   two 4    a   one 5    c   two 

Now, I want to group the dataframe by the key1 and count the column key2 with the value "one" to get this result:

   key1   0    a   2 1    b   1 2    c   0 

I just get the usual count with:

df.groupby(['key1']).size() 

But I don't know how to insert the condition.

I tried things like this:

df.groupby(['key1']).apply(df[df['key2'] == 'one']) 

But I can't get any further. How can I do this?

like image 615
Sethias Avatar asked Aug 18 '17 08:08

Sethias


People also ask

How do you count after Groupby in pandas?

Using pandas groupby count() You can also use the pandas groupby count() function which gives the “count” of values in each column for each group. For example, let's group the dataframe df on the “Team” column and apply the count() function. We get a dataframe of counts of values for each group and each column.

How do I count the number of rows in each group of a Groupby object?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.

How do you sort values after Groupby?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.


1 Answers

I think you need add condition first:

#if need also category c with no values of 'one' df11=df.groupby('key1')['key2'].apply(lambda x: (x=='one').sum()).reset_index(name='count') print (df11)   key1  count 0    a      2 1    b      1 2    c      0 

Or use categorical with key1, then missing value is added by size:

df['key1'] = df['key1'].astype('category') df1 = df[df['key2'] == 'one'].groupby(['key1']).size().reset_index(name='count')  print (df1)   key1  count 0    a      2 1    b      1 2    c      0 

If need all combinations:

df2 = df.groupby(['key1', 'key2']).size().reset_index(name='count')  print (df2)   key1 key2  count 0    a  one      2 1    a  two      1 2    b  one      1 3    b  two      1 4    c  two      1  df3 = df.groupby(['key1', 'key2']).size().unstack(fill_value=0) print (df3) key2  one  two key1           a       2    1 b       1    1 c       0    1 
like image 170
jezrael Avatar answered Oct 08 '22 20:10

jezrael