Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe: Remove secondary upcoming same value

I have a dataframe:

col1  col2
 a     0
 b     1
 c     1
 d     0
 c     1
 d     0

On 'col2' I want to keep only the first 1 from the top and replace every 1 below the first one with a 0, such that the output is:

col1  col2
 a     0
 b     1
 c     0
 d     0
 c     0
 d     0

Thank you very much.

like image 205
s900n Avatar asked Dec 06 '18 15:12

s900n


2 Answers

You can find the index of the first 1 and set others to 0:

mask = df['col2'].eq(1)
df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0

For better performance, see Efficiently return the index of the first value satisfying condition in array.

like image 78
jpp Avatar answered Oct 06 '22 09:10

jpp


np.flatnonzero

Because I thought we needed more answers

df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
df

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

Same thing but a little more sneaky.

df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
df

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0
like image 20
piRSquared Avatar answered Oct 06 '22 10:10

piRSquared