Pandas: Drop all records of duplicate indices

Question

I have a dataset with potentially duplicate records of the identifier appkey. The duplicated records should ideally not exist and therefore I take them to be data collection mistakes. I need to drop all instances of an appkey which occurs more than once.

The drop_duplicates method is not useful in this case (or is it?) as it either selects the first or the last of the duplicates. Is there any obvious idiom to achieve this with pandas?

Dan Allan · Accepted Answer

As of pandas version 0.12, we have filter for this. It does exactly what @Andy's solution does using transform, but a little more succinctly and somewhat faster.

df.groupby('AppKey').filter(lambda x: x.count() == 1)

To steal @Andy's example,

In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['AppKey', 'B'])

In [2]: df.groupby('AppKey').filter(lambda x: x.count() == 1)
Out[2]: 
   AppKey  B
2       5  6

Pandas: Drop all records of duplicate indices

Tags:

pandas

duplicates

asb

1 Answers

Dan Allan

Recent Activity

Donate For Us

Pandas: Drop all records of duplicate indices

Tags:

pandas

duplicates

asb

1 Answers

Dan Allan

Related questions

Recent Activity

Donate For Us