Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of unique values per column by group

Tags:

python

pandas

Consider the following dataframe:

      A      B  E
0   bar    one  1
1   bar  three  1
2  flux    six  1
3  flux  three  2
4   foo   five  2
5   foo    one  1
6   foo    two  1
7   foo    two  2

I would like to find, for each value of A, the number of unique values in the other columns.

  1. I thought the following would do it:

    df.groupby('A').apply(lambda x: x.nunique())
    

    but I get an error:

    AttributeError: 'DataFrame' object has no attribute 'nunique'
    
  2. I also tried with:

    df.groupby('A').nunique()
    

    but I also got the error:

    AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'
    
  3. Finally I tried with:

    df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique()))
    

    which returns:

          A  B  E
    A            
    bar   1  2  1
    flux  1  2  2
    foo   1  3  2
    

    and seems to be correct. Strangely though, it also returns the column A in the result. Why?

like image 921
Amelio Vazquez-Reina Avatar asked Nov 18 '14 20:11

Amelio Vazquez-Reina


People also ask

How do you count unique values in Groupby?

Use df. groupby('rank')['id']. count() to find the count of unique values per groups and store it in a variable "count".

How do you count unique values in a DataFrame column?

In order to get the count of unique values on multiple columns use pandas DataFrame. drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.

How do you count unique values in a list?

You can use the combination of the SUM and COUNTIF functions to count unique values in Excel. The syntax for this combined formula is = SUM(IF(1/COUNTIF(data, data)=1,1,0)). Here the COUNTIF formula counts the number of times each value in the range appears.


2 Answers

The DataFrame object doesn't have nunique, only Series do. You have to pick out which column you want to apply nunique() on. You can do this with a simple dot operator:

df.groupby('A').apply(lambda x: x.B.nunique())

will print:

A
bar     2
flux    2
foo     3

And doing:

df.groupby('A').apply(lambda x: x.E.nunique())

will print:

A
bar     1
flux    2
foo     2

Alternatively you can do this with one function call using:

df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})

which will print:

      B  E
A
bar   2  1
flux  2  2
foo   3  2

To answer your question about why your recursive lambda prints the A column as well, it's because when you do a groupby/apply operation, you're now iterating through three DataFrame objects. Each DataFrame object is a sub-DataFrame of the original. Applying an operation to that will apply it to each Series. There are three Series per DataFrame you're applying the nunique() operator to.

The first Series being evaluated on each DataFrame is the A Series, and since you've done a groupby on A, you know that in each DataFrame, there is only one unique value in the A Series. This explains why you're ultimately given an A result column with all 1's.

like image 94
huu Avatar answered Oct 27 '22 01:10

huu


I encountered the same problem. Upgrading pandas to the latest version solved the problem for me.

df.groupby('A').nunique()

The above code did not work for me in Pandas version 0.19.2. I upgraded it to Pandas version 0.21.1 and it worked.

You can check the version using the following code:

print('Pandas version ' + pd.__version__)
like image 22
Aswitha Visvesvaran Avatar answered Oct 27 '22 00:10

Aswitha Visvesvaran