Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby take counts greater than 1

Tags:

python

pandas

This must be a simple question, but, however, it is bugging my head for a while.

For a dataframe below:

df = pd.DataFrame({'c0': ['a','b','a'],'c1': ['a','bb','a'],'c2':[10,20,30]})
  c0  c1  c2
0  a   a  10
1  b  bb  20
2  a   a  30

How to get output where count > 1?

I have tried:

df.groupby(['c0','c1'])['c2'].count()
c0  c1
a   a     2
b   bb    1

Required is:

c0  c1
a   a     2

I am looking other than

x = df.groupby(['c0','c1'])['c2'].count()
x[x>1]

i.e. a one-liner answer.

like image 577
BhishanPoudel Avatar asked Mar 26 '19 15:03

BhishanPoudel


People also ask

How do you count after Groupby in pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.

How do you use Groupby and count?

The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

How do you use greater than and less than in pandas?

le (equivalent to <= ) — less than or equals to. lt (equivalent to < ) — less than. ge (equivalent to >= ) — greater than or equals to. gt (equivalent to > ) — greater than.


1 Answers

Use GroupBy.transform for Series with same size like original DataFrame:

df1 = df[df.groupby(['c0','c1'])['c2'].transform('count') > 1]

Or use DataFrame.duplicated for filtered all dupe rows by specified columns in list:

df1 = df[df.duplicated(['c0','c1'], keep=False)]

If performance is in not important or small DataFrame use DataFrameGroupBy.filter:

df1 = df.groupby(['c0','c1']).filter(lambda x: len(x) > 1)
like image 60
jezrael Avatar answered Oct 18 '22 04:10

jezrael