Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

event start-end into hot encoding in python

I have a pandas dataframe with 2 columns "type" and "sign" as follows

    type    sign
0   open    A
1   open    B
2   open    D
3   close   B
4   close   D
5   open    B
6   close   B
7   close   A

"A" + "open" means that event A has started happening. "A" + "close" means that event A has stopped. I need to vectorize it, when time is a major issue (the list is actually about 40 million lines long). Kind of like one-hot-encoding, but i need "1" to exist if and only if an event is "active". for example, for this case the results should be:

    A   B   C   D   type    sign
0   1   0   0   0   open    A
1   1   1   0   0   open    B
2   1   1   0   1   open    D
3   1   0   0   1   close   B
4   1   0   0   0   close   D
5   1   1   0   0   open    B
6   1   0   0   0   close   B
7   0   0   0   0   close   A

Ideas? thanks

like image 508
Guy Barash Avatar asked Feb 10 '26 21:02

Guy Barash


1 Answers

IIUC let do get_dummies then do cumsum

s=df.sign.str.get_dummies().reindex(columns=list('ABCD'),fill_value=0).\
    mul(df.type.map({'open':1,'close':-1}),axis=0).cumsum()
   A  B  C  D
0  1  0  0  0
1  1  1  0  0
2  1  1  0  1
3  1  0  0  1
4  1  0  0  0
5  1  1  0  0
6  1  0  0  0
7  0  0  0  0
df=df.join(s)
like image 151
BENY Avatar answered Feb 12 '26 14:02

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!