Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Counting Unique Rows

I have a pandas data frame similar to:

ColA ColB
1    1
1    1
1    1
1    2
1    2
2    1
3    2

I want an output that has the same function as Counter. I need to know how many time each row appears (with all of the columns being the same.

In this case the proper output would be:

ColA ColB Count
1    1    3
1    2    2
2    1    1
3    2    1

I have tried something of the sort:

df.groupby(['ColA','ColB']).ColA.count()

but this gives me some ugly output I am having trouble formatting

like image 860
qwertylpc Avatar asked Mar 15 '16 18:03

qwertylpc


People also ask

How do I count unique rows in pandas DataFrame?

You can use the nunique() function to count the number of unique values in a pandas DataFrame.

How do I get unique values in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.


2 Answers

You can use size with reset_index:

print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
   ColA  ColB  Count
0     1     1      3
1     1     2      2
2     2     1      1
3     3     2      1
like image 183
jezrael Avatar answered Sep 22 '22 23:09

jezrael


I only needed to count the unique rows and have used the DataFrame.drop_duplicates alternative as below:

len(df[['ColA', 'ColB']].drop_duplicates())

It was twice as fast on my data than len(df.groupby(['ColA', 'ColB'])).

like image 43
eddygeek Avatar answered Sep 22 '22 23:09

eddygeek