Consider the following dataframe:
A B E
0 bar one 1
1 bar three 1
2 flux six 1
3 flux three 2
4 foo five 2
5 foo one 1
6 foo two 1
7 foo two 2
I would like to find, for each value of A
, the number of unique values in the other columns.
I thought the following would do it:
df.groupby('A').apply(lambda x: x.nunique())
but I get an error:
AttributeError: 'DataFrame' object has no attribute 'nunique'
I also tried with:
df.groupby('A').nunique()
but I also got the error:
AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'
Finally I tried with:
df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique()))
which returns:
A B E
A
bar 1 2 1
flux 1 2 2
foo 1 3 2
and seems to be correct. Strangely though, it also returns the column A
in the result. Why?
Use df. groupby('rank')['id']. count() to find the count of unique values per groups and store it in a variable "count".
In order to get the count of unique values on multiple columns use pandas DataFrame. drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.
You can use the combination of the SUM and COUNTIF functions to count unique values in Excel. The syntax for this combined formula is = SUM(IF(1/COUNTIF(data, data)=1,1,0)). Here the COUNTIF formula counts the number of times each value in the range appears.
The DataFrame
object doesn't have nunique
, only Series
do. You have to pick out which column you want to apply nunique()
on. You can do this with a simple dot operator:
df.groupby('A').apply(lambda x: x.B.nunique())
will print:
A
bar 2
flux 2
foo 3
And doing:
df.groupby('A').apply(lambda x: x.E.nunique())
will print:
A
bar 1
flux 2
foo 2
Alternatively you can do this with one function call using:
df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})
which will print:
B E
A
bar 2 1
flux 2 2
foo 3 2
To answer your question about why your recursive lambda prints the A
column as well, it's because when you do a groupby
/apply
operation, you're now iterating through three DataFrame
objects. Each DataFrame
object is a sub-DataFrame
of the original. Applying an operation to that will apply it to each Series
. There are three Series
per DataFrame
you're applying the nunique()
operator to.
The first Series
being evaluated on each DataFrame
is the A
Series
, and since you've done a groupby
on A
, you know that in each DataFrame
, there is only one unique value in the A
Series
. This explains why you're ultimately given an A
result column with all 1
's.
I encountered the same problem. Upgrading pandas to the latest version solved the problem for me.
df.groupby('A').nunique()
The above code did not work for me in Pandas version 0.19.2. I upgraded it to Pandas version 0.21.1 and it worked.
You can check the version using the following code:
print('Pandas version ' + pd.__version__)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With