How to drop unique rows in a pandas dataframe?

I am stuck with a seemingly easy problem: dropping unique rows in a pandas dataframe. Basically, the opposite of drop_duplicates().

Let's say this is my data:

    A       B   C  
0   foo     0   A
1   foo     1   A
2   foo     1   B
3   bar     1   A

I would like to drop the rows when A, and B are unique, i.e. I would like to keep only the rows 1 and 2.

I tried the following:

# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates()
duplicates = df[~df.index.isin(uniques.index)]

But I only get the row 2, as 0, 1, and 3 are in the uniques!

How do you drop unique rows in pandas?

Get the unique values (distinct rows) of the dataframe in python pandas. drop_duplicates() function is used to get the unique values (rows) of the dataframe in python pandas. The above drop_duplicates() function removes all the duplicate rows and returns only unique rows.

How do I delete duplicate rows in pandas DataFrame?

You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

How do I select unique values in pandas?

Unique is also referred to as distinct, you can get unique values in the column using pandas Series. unique() function, since this function needs to call on the Series object, use df['column_name'] to get the unique values as a Series.

Solutions for select all duplicated rows:

You can use duplicated with subset and parameter keep=False for select all duplicates:

df = df[df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

Solution with transform:

df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

A bit modified solutions for select all unique rows:

#invert boolean mask by ~
df = df[~df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

I came up with a solution using groupby:

groupped = df.groupby(['A', 'B']).size().reset_index().rename(columns={0: 'count'})
uniques = groupped[groupped['count'] == 1]
duplicates = df[~df.index.isin(uniques.index)]

Duplicates now has the proper result:

    A       B   C
2   foo     1   B
3   bar     1   A

Also, my original attempt in the question can be fixed by simply adding keep=False in the drop_duplicates method:

# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates(keep=False)
duplicates = df[~df.index.isin(uniques.index)]

Please @jezrael answer, I think it is safest(?), as I am using pandas indexes here.

How to drop unique rows in a pandas dataframe?

Tags:

python

pandas

dataframe

toto_tico

People also ask

2 Answers

jezrael

toto_tico

Recent Activity

Donate For Us

How to drop unique rows in a pandas dataframe?

Tags:

python

pandas

dataframe

toto_tico

People also ask

2 Answers

jezrael

toto_tico

Related questions

Recent Activity

Donate For Us