Let's say I have the following series.
s = pandas.Series([0, 1, 2, 3, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7])
I can keep the first duplicate (for each duplicate value) of the series with the following
s[s.duplicated(keep='first')]
I can keep the last duplicate (for each duplicate value) of the series with the following
s[s.duplicated(keep='last')]
However, I'm looking to do the following.
3
, but keep the other 3's
. Keep all other remaining duplicates.3
, but drop all other 3's
. Keep all other remaining duplicates.I've been racking my brain using cumsum()
and diff()
to capture the change when a duplicate has been detected. I imagine a solution would involve this, but I can't seem to get a perfect solution. I've gone through too many truth tables right now...
ind = s[s.duplicated()].index[0]
gives you the first index where a record is duplicated. Use it to drop.
In [45]: s.drop(ind)
Out[45]:
0 0
1 1
2 2
4 3
5 3
6 3
7 4
8 5
9 6
10 6
11 6
12 7
13 7
dtype: int64
For part 2, there must be a neat solution, but the only one I can think of is to use create a series of bools to indicate where the index does not equal ind and the value at the index does equal the ind value and then use np.logical_xor:
s[np.logical_xor(s.index != ind, s==s.iloc[ind])]
Out[95]:
0 0
1 1
2 2
4 3
7 4
8 5
9 6
10 6
11 6
12 7
13 7
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With