Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Display rows where any value in a particular column occurs more than once

Tags:

python

pandas

I want to display all the rows where any value in the column - "Website" occurs more than once. For example - if a certain website "xyz.com" occurs more than once, then I want to display all those rows. I am using the below code -

df[df.website.isin(df.groupby('website').website.count() > 1)]

Above code returns zero rows. But I can actually see that there are so many websites that occurs more than once by running the below code -

df.website.value_counts()

How should I modify my 1st line of code to display all such rows?

like image 601
ComplexData Avatar asked Mar 11 '23 18:03

ComplexData


1 Answers

Use duplicated with subset='website' and keep=False:

df[df.duplicated(subset='website', keep=False)]

Sample Input:

  col1  website
0    A  abc.com
1    B  abc.com
2    C  abc.com
3    D  abc.net
4    E  xyz.com
5    F  foo.bar
6    G  xyz.com
7    H  foo.baz 

Sample Output:

  col1  website
0    A  abc.com
1    B  abc.com
2    C  abc.com
4    E  xyz.com
6    G  xyz.com
like image 148
root Avatar answered May 04 '23 11:05

root