<p>This would be useful so I know how many unique groups I have to perform calculations on. Thank you.</p> <p>Suppose groupby object is called <code>dfgroup</code>. </p>

<h3>[pandas >= 0.23] Simple, Fast, and Pandaic: <code>ngroups</code> </h3> <p>Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.</p> <pre class="prettyprint"><code># setup df = pd.DataFrame({'A': list('aabbcccd')}) dfg = df.groupby('A') </code></pre> <p></p> <pre class="prettyprint"><code># call `.ngroups` on the GroupBy object dfg.ngroups # 4 </code></pre> <p>Note that this is different from <code>GroupBy.groups</code> which returns the actual groups themselves.</p> <h3><strong>Why should I prefer this over <code>len</code>?</strong></h3> <p>As noted in BrenBarn's answer, you could use <code>len(dfg)</code> to get the number of groups. <strong>But you shouldn't</strong>. Looking at the implementation of <code>GroupBy.__len__</code> (which is what <code>len()</code> calls interally), we see that <code>__len__</code> makes a call to <code>GroupBy.groups</code>, which returns a dictionary of grouped indices:</p> <pre class="prettyprint"><code>dfg.groups {'a': Int64Index([0, 1], dtype='int64'), 'b': Int64Index([2, 3], dtype='int64'), 'c': Int64Index([4, 5, 6], dtype='int64'), 'd': Int64Index([7], dtype='int64')} </code></pre> <p>Depending on the number of groups in your operation, <strong>generating the dictionary only to find its length is a wasteful step</strong>. <code>ngroups</code> on the other hand is a stored property that can be <strong>accessed in constant time</strong>.</p> <p>This has been documented in <code>GroupBy</code> object attributes. The issue with <code>len</code>, however, is that for a GroupBy object with a lot of groups, this can take a lot longer </p> <h3>But what if I actually want the size of each group?</h3> <p>You're in luck. We have a function for that, it's called <code>GroupBy.size</code>. But please note that <code>size</code> counts NaNs as well. If you don't want NaNs counted, use <code>GroupBy.count</code> instead. </p>

How to get number of groups in a groupby object in pandas?

1 Answers

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.

# setup df = pd.DataFrame({'A': list('aabbcccd')}) dfg = df.groupby('A')

# call `.ngroups` on the GroupBy object dfg.ngroups # 4

Note that this is different from GroupBy.groups which returns the actual groups themselves.

Why should I prefer this over `len`?

As noted in BrenBarn's answer, you could use len(dfg) to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__ (which is what len() calls interally), we see that __len__ makes a call to GroupBy.groups, which returns a dictionary of grouped indices:

dfg.groups {'a': Int64Index([0, 1], dtype='int64'),  'b': Int64Index([2, 3], dtype='int64'),  'c': Int64Index([4, 5, 6], dtype='int64'),  'd': Int64Index([7], dtype='int64')}

Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroups on the other hand is a stored property that can be accessed in constant time.

This has been documented in GroupBy object attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer

But what if I actually want the size of each group?

You're in luck. We have a function for that, it's called GroupBy.size. But please note that size counts NaNs as well. If you don't want NaNs counted, use GroupBy.count instead.

131

answered Sep 21 '22 06:09

cs95

Related questions
                            
                                Django: How should I store a money value?
                            
                                Matplotlib subplots_adjust hspace so titles and xlabels don't overlap?
                            
                                Django Deprecation Warning or ImproperlyConfigured error - Passing a 3-tuple to django.conf.urls.include() is not supported
                            
                                How to import keras from tf.keras in Tensorflow?
                            
                                How to write Python code that is able to properly require a minimal python version?
                            
                                how to get the last part of a string before a certain character?
                            
                                Python sharing a lock between processes
                            
                                Is `id` a keyword in python?
                            
                                Python configuration file: Any file format recommendation? INI format still appropriate? Seems quite old school
                            
                                List Directories and get the name of the Directory
                            
                                subsetting a Python DataFrame
                            
                                Selenium Finding elements by class name in python
                            
                                Merge multiple column values into one column in python pandas
                            
                                Check if a file is not open nor being used by another process
                            
                                Why does Python return [15] for [0xfor x in (1, 2, 3)]? [duplicate]
                            
                                Using any() and all() to check if a list contains one set of values or another
                            
                                Compute row average in pandas
                            
                                What exactly is meant by "partial function" in functional programming?
                            
                                How to convert an integer to the shortest url-safe string in Python?
                            
                                How to use 2to3 properly for python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get number of groups in a groupby object in pandas?

Tags:

python

pandas

dataframe

group-by

pandas-groupby

wolfsatthedoor

People also ask

1 Answers

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

Why should I prefer this over `len`?

But what if I actually want the size of each group?

cs95

Recent Activity

Donate For Us

How to get number of groups in a groupby object in pandas?

Tags:

python

pandas

dataframe

group-by

pandas-groupby

wolfsatthedoor

People also ask

1 Answers

[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups

Why should I prefer this over len?

But what if I actually want the size of each group?

cs95

Related questions

Recent Activity

Donate For Us

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

Why should I prefer this over `len`?