I have a dataset :
id    url     keep_if_dup
1     A.com   Yes
2     A.com   Yes
3     B.com   No
4     B.com   No
5     C.com   No
I want to remove duplicates, i.e. keep first occurence of "url" field, BUT keep duplicates if the field "keep_if_dup" is YES.
Expected output :
id    url     keep_if_dup
1     A.com   Yes
2     A.com   Yes
3     B.com   No
5     C.com   No
What I tried :
Dataframe=Dataframe.drop_duplicates(subset='url', keep='first')
which of course does not take into account "keep_if_dup" field. Output is :
id    url     keep_if_dup
1     A.com   Yes
3     B.com   No
5     C.com   No
                To remove duplicate values, click Data > Data Tools > Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
By using pandas. DataFrame. drop_duplicates() method you can remove duplicate rows from DataFrame. Using this method you can drop duplicate rows on selected multiple columns or all columns.
You can pass multiple boolean conditions to loc, the first keeps all rows where col 'keep_if_dup' == 'Yes', this is ored (using |) with the inverted boolean mask of whether col 'url' column is duplicated or not:
In [79]:
df.loc[(df['keep_if_dup'] =='Yes') | ~df['url'].duplicated()]
Out[79]:
   id    url keep_if_dup
0   1  A.com         Yes
1   2  A.com         Yes
2   3  B.com          No
4   5  C.com          No
to overwrite your df self-assign back:
df = df.loc[(df['keep_if_dup'] =='Yes') | ~df['url'].duplicated()]
breaking down the above shows the 2 boolean masks:
In [80]:
~df['url'].duplicated()
Out[80]:
0     True
1    False
2     True
3    False
4     True
Name: url, dtype: bool
In [81]:
df['keep_if_dup'] =='Yes'
Out[81]:
0     True
1     True
2    False
3    False
4    False
Name: keep_if_dup, dtype: bool
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With