Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove non-duplicated rows from pandas

Tags:

python

pandas

This is rather simple but I can't get me head around it. Let's say for the following data frame, I want to keep only the rows with duplicated values in column y:

>>> df
   x  y
    x   y
0   1   1
1   2   2
2   3   2
3   4   3
4   5   3
5   6   3
6   7   5
7   8   2

The desired output looks like:

>>> df
    x   y
1   2   2
2   3   2
3   4   3
4   5   3
5   6   3
7   8   2

I tried this:

df[~df.duplicated('y')]

but I get this:

    x   y
0   1   1
1   2   2
3   4   3
6   7   5
like image 273
mallet Avatar asked Aug 05 '17 22:08

mallet


People also ask

How do I delete non unique rows in pandas?

Dropping duplicate rows We can use Pandas built-in method drop_duplicates() to drop duplicate rows. Note that we started out as 80 rows, now it's 77. By default, this method returns a new DataFrame with duplicate rows removed. We can set the argument inplace=True to remove duplicates from the original DataFrame.

How do you delete repetitive rows in pandas?

Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

How do I extract unique rows in pandas?

And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])


1 Answers

Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

keep : {‘first’, ‘last’, False}, default ‘first’

  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Meaning you are looking for:

df[df.duplicated('y',keep=False)]

Output:

    x   y
1   2   2
2   3   2
3   4   3
4   5   3
5   6   3
7   8   2
like image 132
Anton vBR Avatar answered Nov 15 '22 08:11

Anton vBR