The drop_duplicates
method of a Pandas DataFrame considers all columns (default) or a subset of columns (optional) in removing duplicate rows, and cannot consider duplicate index.
I am looking for a clean one-line solution that considers the index and a subset or all columns in determining duplicate rows. For example, consider the DataFrame
df = pd.DataFrame(index=['a', 'b', 'b', 'c'], data={'A': [0, 0, 0, 0], 'B': [1, 0, 0, 0]})
A B
a 0 1
b 0 0
b 0 0
c 0 0
Default use of the drop_duplicates
method gives
df.drop_duplicates()
A B
a 0 1
b 0 0
If the index is also considered in determining duplicate rows, the result should be
df.drop_duplicates(consider_index=True) # not a supported keyword argument
A B
a 0 1
b 0 0
c 0 0
Is there a single method providing this functionality, that is better than my current approach:
df['index'] = df.index
df.drop_duplicates(inplace=True)
del df['index']
Call reset_index
and duplicated
, and then index the original:
df = df[~df.reset_index().duplicated().values]
print (df)
A B
a 0 1
b 0 0
c 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With