Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Drop Very First Duplicate only

Tags:

python

pandas

Let's say I have the following series.

s = pandas.Series([0, 1, 2, 3, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7])

I can keep the first duplicate (for each duplicate value) of the series with the following

s[s.duplicated(keep='first')]

I can keep the last duplicate (for each duplicate value) of the series with the following

s[s.duplicated(keep='last')]

However, I'm looking to do the following.

  1. Drop only the very first duplicate, keep the other duplicates of that matching value, but also keep all other duplicates of varying values (including the first ones of each group). In the example above, we'd drop the first 3, but keep the other 3's. Keep all other remaining duplicates.
  2. Keep the first duplicate, drop the duplicates that matching value, but also keep all the other duplicates of other varying values. In the example above, we'd keep the first 3, but drop all other 3's. Keep all other remaining duplicates.

I've been racking my brain using cumsum() and diff() to capture the change when a duplicate has been detected. I imagine a solution would involve this, but I can't seem to get a perfect solution. I've gone through too many truth tables right now...

like image 320
jab Avatar asked Feb 04 '23 05:02

jab


1 Answers

ind = s[s.duplicated()].index[0]

gives you the first index where a record is duplicated. Use it to drop.

In [45]: s.drop(ind)
Out[45]:
0     0
1     1
2     2
4     3
5     3
6     3
7     4
8     5
9     6
10    6
11    6
12    7
13    7
dtype: int64

For part 2, there must be a neat solution, but the only one I can think of is to use create a series of bools to indicate where the index does not equal ind and the value at the index does equal the ind value and then use np.logical_xor:

s[np.logical_xor(s.index != ind, s==s.iloc[ind])]

Out[95]:
0     0
1     1
2     2
4     3
7     4
8     5
9     6
10    6
11    6
12    7
13    7
dtype: int64
like image 95
Woody Pride Avatar answered Feb 06 '23 18:02

Woody Pride