Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get number of groups in a groupby object in pandas?

This would be useful so I know how many unique groups I have to perform calculations on. Thank you.

Suppose groupby object is called dfgroup.

like image 672
wolfsatthedoor Avatar asked Jan 05 '15 21:01

wolfsatthedoor


People also ask

How do I see the number of groups in Groupby pandas?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.

What does Groupby size () return?

1) Using pandas groupby size() method The most simple method for pandas groupby count is by using the in-built pandas method named size(). It returns a pandas series that possess the total number of row count for each group.


1 Answers

[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups

Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.

# setup df = pd.DataFrame({'A': list('aabbcccd')}) dfg = df.groupby('A') 

# call `.ngroups` on the GroupBy object dfg.ngroups # 4 

Note that this is different from GroupBy.groups which returns the actual groups themselves.

Why should I prefer this over len?

As noted in BrenBarn's answer, you could use len(dfg) to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__ (which is what len() calls interally), we see that __len__ makes a call to GroupBy.groups, which returns a dictionary of grouped indices:

dfg.groups {'a': Int64Index([0, 1], dtype='int64'),  'b': Int64Index([2, 3], dtype='int64'),  'c': Int64Index([4, 5, 6], dtype='int64'),  'd': Int64Index([7], dtype='int64')} 

Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroups on the other hand is a stored property that can be accessed in constant time.

This has been documented in GroupBy object attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer

But what if I actually want the size of each group?

You're in luck. We have a function for that, it's called GroupBy.size. But please note that size counts NaNs as well. If you don't want NaNs counted, use GroupBy.count instead.

like image 131
cs95 Avatar answered Sep 21 '22 06:09

cs95