Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop duplicates from a subset of rows in a pandas dataframe?

Tags:

python

pandas

I have a dataframe like this:

A   B       C
12  true    1
12  true    1
3   nan     2
3   nan     3

I would like to drop all rows where the value of column A is duplicate but only if the value of column B is 'true'.

The resulting dataframe I have in mind is:

A   B       C
12  true    1
3   nan     2
3   nan     3

I tried using: df.loc[df['B']=='true'].drop_duplicates('A', inplace=True, keep='first') but it doesn't seem to work.

Thanks for your help!

like image 308
Tatsuya Avatar asked Feb 22 '18 19:02

Tatsuya


People also ask

How do you drop duplicate rows in Pandas based on a column?

Pandas drop_duplicates function has an argument to specify which columns we need to use to identify duplicates. For example, to remove duplicate rows using the column 'continent', we can use the argument “subset” and specify the column name we want to identify duplicate.

What does Drop_duplicates do in Pandas?

Pandas DataFrame drop_duplicates() Method The drop_duplicates() method removes duplicate rows. Use the subset parameter if only some specified columns should be considered when looking for duplicates.


2 Answers

You can sue pd.concat split the df by B

df=pd.concat([df.loc[df.B!=True],df.loc[df.B==True].drop_duplicates(['A'],keep='first')]).sort_index()
df

Out[1593]: 
    A     B  C
0  12  True  1
2   3   NaN  2
3   3   NaN  3
like image 99
BENY Avatar answered Sep 16 '22 12:09

BENY


df[df.B.ne(True) | ~df.A.duplicated()]

    A     B  C
0  12  True  1
2   3   NaN  2
3   3   NaN  3
like image 24
piRSquared Avatar answered Sep 19 '22 12:09

piRSquared