Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting unique index values in Pandas groupby

Tags:

python

pandas

In Pandas, there is a very clean way to count the distinct values in a column within a group by operation. For example

ex = pd.DataFrame([[1, 2, 3], [6, 7, 8], [1, 7, 9]], 
                  columns=["A", "B", "C"]).set_index(["A", "B"])
ex.groupby(level="A").C.nunique()

will return

A
1    2
6    1
Name: C, dtype: int64

I would also like to count the distinct values in index level B while grouping by A. I can't find a clean way to access the levels of B from the groupby object. The best I've been able to come up with is:

ex.reset_index("B", drop=False).groupby(level="A").B.nunique()

which correctly returns:

A
1    2
6    1
Name: B, dtype: int64 

Is there a way for me to do this on the groupby without resetting the index or using an apply function?

like image 424
Tim Hopper Avatar asked Feb 03 '16 13:02

Tim Hopper


People also ask

How do you count unique values in pandas Groupby?

Method 1: Count unique values using nunique() The Pandas dataframe. nunique() function returns a series with the specified axis's total number of unique observations. The total number of distinct observations over the index axis is discovered if we set the value of the axis to 0.

How do you count unique entries in pandas?

To count unique values in the pandas dataframe column use Series. unique() function and then call the size to get the count. Series.

How do you count indexes in pandas?

You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.

How do I count the number of rows in Groupby pandas?

The most simple method for pandas groupby count is by using the in-built pandas method named size(). It returns a pandas series that possess the total number of row count for each group. The basic working of the size() method is the same as len() method and hence, it is not affected by NaN values in the dataset.


1 Answers

IIUC you could do reset_index for all levels, then groupby be 'A' and apply nunique method:

res = ex.reset_index().groupby('A').agg(lambda x: x.nunique())

In [339]: res
Out[339]:
   B  C
A
1  2  2
6  1  1

Same solution with pivot_table:

In [341]: ex.reset_index().pivot_table(index='A', aggfunc=lambda x: x.nunique())
Out[341]:
   B  C
A
1  2  2
6  1  1
like image 75
Anton Protopopov Avatar answered Sep 28 '22 06:09

Anton Protopopov