Consider the following dataframe: <pre class="prettyprint"><code> A B E 0 bar one 1 1 bar three 1 2 flux six 1 3 flux three 2 4 foo five 2 5 foo one 1 6 foo two 1 7 foo two 2 </code></pre> I would like to find, for each value of <code>A</code>, the number of unique values in the other columns. <ol> <li> I thought the following would do it: <pre class="prettyprint"><code>df.groupby('A').apply(lambda x: x.nunique()) </code></pre> but I get an error: <pre class="prettyprint"><code>AttributeError: 'DataFrame' object has no attribute 'nunique' </code></pre> </li> <li> I also tried with: <pre class="prettyprint"><code>df.groupby('A').nunique() </code></pre> but I also got the error: <pre class="prettyprint"><code>AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique' </code></pre> </li> <li> Finally I tried with: <pre class="prettyprint"><code>df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique())) </code></pre> which returns: <pre class="prettyprint"><code> A B E A bar 1 2 1 flux 1 2 2 foo 1 3 2 </code></pre> and seems to be correct. Strangely though, it also returns the column <code>A</code> in the result. Why? </li> </ol>

The <code>DataFrame</code> object doesn't have <code>nunique</code>, only <code>Series</code> do. You have to pick out which column you want to apply <code>nunique()</code> on. You can do this with a simple dot operator: <pre class="prettyprint"><code>df.groupby('A').apply(lambda x: x.B.nunique()) </code></pre> will print: <pre class="prettyprint"><code>A bar 2 flux 2 foo 3 </code></pre> And doing: <pre class="prettyprint"><code>df.groupby('A').apply(lambda x: x.E.nunique()) </code></pre> will print: <pre class="prettyprint"><code>A bar 1 flux 2 foo 2 </code></pre> Alternatively you can do this with one function call using: <pre class="prettyprint"><code>df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()}) </code></pre> which will print: <pre class="prettyprint"><code> B E A bar 2 1 flux 2 2 foo 3 2 </code></pre> To answer your question about why your recursive lambda prints the <code>A</code> column as well, it's because when you do a <code>groupby</code>/<code>apply</code> operation, you're now iterating through three <code>DataFrame</code> objects. Each <code>DataFrame</code> object is a sub-<code>DataFrame</code> of the original. Applying an operation to that will apply it to each <code>Series</code>. There are three <code>Series</code> per <code>DataFrame</code> you're applying the <code>nunique()</code> operator to. The first <code>Series</code> being evaluated on each <code>DataFrame</code> is the <code>A</code> <code>Series</code>, and since you've done a <code>groupby</code> on <code>A</code>, you know that in each <code>DataFrame</code>, there is only one unique value in the <code>A</code> <code>Series</code>. This explains why you're ultimately given an <code>A</code> result column with all <code>1</code>'s.

Number of unique values per column by group

Tags:

python

pandas

Consider the following dataframe:

      A      B  E
0   bar    one  1
1   bar  three  1
2  flux    six  1
3  flux  three  2
4   foo   five  2
5   foo    one  1
6   foo    two  1
7   foo    two  2

I would like to find, for each value of A, the number of unique values in the other columns.

I thought the following would do it:

df.groupby('A').apply(lambda x: x.nunique())

but I get an error:

AttributeError: 'DataFrame' object has no attribute 'nunique'

I also tried with:

df.groupby('A').nunique()

but I also got the error:

AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'

Finally I tried with:

df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique()))

which returns:

      A  B  E
A            
bar   1  2  1
flux  1  2  2
foo   1  3  2

and seems to be correct. Strangely though, it also returns the column A in the result. Why?

921

asked Nov 18 '14 20:11

Amelio Vazquez-Reina

2 Answers

The DataFrame object doesn't have nunique, only Series do. You have to pick out which column you want to apply nunique() on. You can do this with a simple dot operator:

df.groupby('A').apply(lambda x: x.B.nunique())

will print:

A
bar     2
flux    2
foo     3

And doing:

df.groupby('A').apply(lambda x: x.E.nunique())

will print:

A
bar     1
flux    2
foo     2

Alternatively you can do this with one function call using:

df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})

which will print:

      B  E
A
bar   2  1
flux  2  2
foo   3  2

To answer your question about why your recursive lambda prints the A column as well, it's because when you do a groupby/apply operation, you're now iterating through three DataFrame objects. Each DataFrame object is a sub-DataFrame of the original. Applying an operation to that will apply it to each Series. There are three Series per DataFrame you're applying the nunique() operator to.

The first Series being evaluated on each DataFrame is the A Series, and since you've done a groupby on A, you know that in each DataFrame, there is only one unique value in the A Series. This explains why you're ultimately given an A result column with all 1's.

answered Oct 27 '22 01:10

huu

I encountered the same problem. Upgrading pandas to the latest version solved the problem for me.

df.groupby('A').nunique()

The above code did not work for me in Pandas version 0.19.2. I upgraded it to Pandas version 0.21.1 and it worked.

You can check the version using the following code:

print('Pandas version ' + pd.__version__)

answered Oct 27 '22 00:10

Aswitha Visvesvaran

Related questions
                            
                                Python : Adding a code routine at each line of a block of code
                            
                                How to get IP address of the launched instance with Boto
                            
                                Celery dies with DBPageNotFoundError
                            
                                sqlalchemy, mixins, foreignkeys and declared_attr
                            
                                Subtract subgroup averages from individuals without resorting to for loop
                            
                                Efficiently grouping a list of coordinates points by location in Python
                            
                                Matplotlib timelines
                            
                                ImportError: "No modules named". But modules already installed in dist-packages
                            
                                Generating random numbers with a given probability density function
                            
                                How to download .gz files with requests in Python without decoding it?
                            
                                Python socket stress concurrency
                            
                                Full list of twitter "friends" using python and tweepy
                            
                                Functools.update_wrapper() doesn't work properly
                            
                                Pandas: number of days elapsed since a certain date
                            
                                Flask JSON serializable error because of flask babel
                            
                                How can I exponentially scale the Y axis with matplotlib
                            
                                Operations on every row in pandas DataFrame
                            
                                Python numpy.var returning wrong values
                            
                                PIL(low) and multi-page TIFFS
                            
                                "Online" monkey patching of a function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With