How can I do one-hot encoding from multiple columns

Question

When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this:

Item	N1	N2	N3	N4
Item1	1	2	4	8
Item2	2	3	6	7
Item3	4	5	7	9
Item4	1	5	6	7
Item5	3	4	7	8

I would like to one-hot encode this to get:

Item	1	2	3	4	5	6	7	8	9
Item1	1	1	0	1	0	0	0	1	0
Item2	0	1	1	0	0	1	1	0	0
Item3	0	0	0	1	1	0	1	0	1
Item4	1	0	0	0	1	1	1	0	0
Item5	0	0	1	1	0	0	1	1	0

Is this feasible at all? I am now in the process of coding some sort of loop to go through each line but I decided to ask if anyone knows a more efficient way to do this.

Triky · Accepted Answer

Use melt and crosstab.

tmp = df.melt('Item')
result = pd.crosstab(tmp['Item'], tmp['value']).reset_index().rename_axis(None, axis=1)

    Item  1  2  3  4  5  6  7  8  9
0  Item1  1  1  0  1  0  0  0  1  0
1  Item2  0  1  1  0  0  1  1  0  0
2  Item3  0  0  0  1  1  0  1  0  1
3  Item4  1  0  0  0  1  1  1  0  0
4  Item5  0  0  1  1  0  0  1  1  0

ThomasIsCoding · Answer

You can try pd.get_dummies, e.g.,

pd.get_dummies(df.melt('Item'), columns=['value'], prefix='', prefix_sep='').drop('variable', axis=1).groupby('Item').sum().reset_index()

which gives

    Item  1  2  3  4  5  6  7  8  9
0  Item1  1  1  0  1  0  0  0  1  0
1  Item2  0  1  1  0  0  1  1  0  0
2  Item3  0  0  0  1  1  0  1  0  1
3  Item4  1  0  0  0  1  1  1  0  0
4  Item5  0  0  1  1  0  0  1  1  0

Data

df = pd.DataFrame(
    {
        "Item": ["Item1", "Item2", "Item3", "Item4", "Item5"],
        "N1": [1, 2, 4, 1, 3],
        "N2": [2, 3, 5, 5, 4],
        "N3": [4, 6, 7, 6, 7],
        "N4": [8, 7, 9, 7, 8],
    }
)

How can I do one-hot encoding from multiple columns

Tags:

python

pandas

one-hot-encoding

Eric

2 Answers

Triky

ThomasIsCoding

Recent Activity

Donate For Us

How can I do one-hot encoding from multiple columns

Tags:

python

pandas

one-hot-encoding

Eric

2 Answers

Triky

ThomasIsCoding

Related questions

Recent Activity

Donate For Us