What is the fastest way to an inverse "multi-hot" (like one-hot with multiple simultaneous categories) operation on a large DataFrame?
I have the follow DataFrame:
id type_A type_B type_C
1 1 1 0
2 0 1 0
3 0 1 1
The operation would give:
id type
1 type_A
1 type_B
2 type_B
3 type_B
3 type_C
Using melt and query:
df = df.melt(id_vars='id', value_vars=['type_A', 'type_B', 'type_C']).query('value == 1')
id variable value
0 1 type_A 1
3 1 type_B 1
4 2 type_B 1
5 3 type_B 1
8 3 type_C 1
With correct column names:
df = (
df.melt(id_vars='id',
value_vars=['type_A', 'type_B', 'type_C'],
var_name='type')
.query('value == 1')
.drop(columns='value')
)
id type
0 1 type_A
3 1 type_B
4 2 type_B
5 3 type_B
8 3 type_C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With