Consider duplicate index in drop_duplicates method of a pandas DataFrame

Question

The drop_duplicates method of a Pandas DataFrame considers all columns (default) or a subset of columns (optional) in removing duplicate rows, and cannot consider duplicate index.

I am looking for a clean one-line solution that considers the index and a subset or all columns in determining duplicate rows. For example, consider the DataFrame

df = pd.DataFrame(index=['a', 'b', 'b', 'c'], data={'A': [0, 0, 0, 0], 'B': [1, 0, 0, 0]})
   A  B
a  0  1
b  0  0
b  0  0
c  0  0

Default use of the drop_duplicates method gives

df.drop_duplicates()
   A  B
a  0  1
b  0  0

If the index is also considered in determining duplicate rows, the result should be

df.drop_duplicates(consider_index=True) # not a supported keyword argument
   A  B
a  0  1
b  0  0
c  0  0

Is there a single method providing this functionality, that is better than my current approach:

df['index'] = df.index
df.drop_duplicates(inplace=True)
del df['index']

cs95 · Accepted Answer

Call reset_index and duplicated, and then index the original:

df = df[~df.reset_index().duplicated().values]
print (df)
   A  B
a  0  1
b  0  0
c  0  0

Consider duplicate index in drop_duplicates method of a pandas DataFrame

Tags:

python

pandas

duplicates

Russell Burdt

1 Answers

cs95

Recent Activity

Donate For Us

Consider duplicate index in drop_duplicates method of a pandas DataFrame

Tags:

python

pandas

duplicates

Russell Burdt

1 Answers

cs95

Related questions

Recent Activity

Donate For Us