Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to do inverse multi-hot encoding in pandas?

Tags:

python

pandas

What is the fastest way to an inverse "multi-hot" (like one-hot with multiple simultaneous categories) operation on a large DataFrame?

I have the follow DataFrame:

id  type_A  type_B  type_C
 1       1       1       0
 2       0       1       0
 3       0       1       1

The operation would give:

id   type
 1 type_A
 1 type_B
 2 type_B
 3 type_B
 3 type_C
like image 415
user50781 Avatar asked Dec 15 '25 04:12

user50781


1 Answers

Using melt and query:

df = df.melt(id_vars='id', value_vars=['type_A', 'type_B', 'type_C']).query('value == 1')

   id variable  value
0   1   type_A      1
3   1   type_B      1
4   2   type_B      1
5   3   type_B      1
8   3   type_C      1

With correct column names:

df = (
    df.melt(id_vars='id', 
            value_vars=['type_A', 'type_B', 'type_C'],
            var_name='type')
      .query('value == 1')
      .drop(columns='value')
)

   id    type
0   1  type_A
3   1  type_B
4   2  type_B
5   3  type_B
8   3  type_C
like image 111
Erfan Avatar answered Dec 16 '25 21:12

Erfan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!