Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I do one-hot encoding from multiple columns

When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this:

Item N1 N2 N3 N4
Item1 1 2 4 8
Item2 2 3 6 7
Item3 4 5 7 9
Item4 1 5 6 7
Item5 3 4 7 8

I would like to one-hot encode this to get:

Item 1 2 3 4 5 6 7 8 9
Item1 1 1 0 1 0 0 0 1 0
Item2 0 1 1 0 0 1 1 0 0
Item3 0 0 0 1 1 0 1 0 1
Item4 1 0 0 0 1 1 1 0 0
Item5 0 0 1 1 0 0 1 1 0

Is this feasible at all? I am now in the process of coding some sort of loop to go through each line but I decided to ask if anyone knows a more efficient way to do this.

like image 662
Eric Avatar asked Aug 31 '25 03:08

Eric


2 Answers

Use melt and crosstab.

tmp = df.melt('Item')
result = pd.crosstab(tmp['Item'], tmp['value']).reset_index().rename_axis(None, axis=1)

    Item  1  2  3  4  5  6  7  8  9
0  Item1  1  1  0  1  0  0  0  1  0
1  Item2  0  1  1  0  0  1  1  0  0
2  Item3  0  0  0  1  1  0  1  0  1
3  Item4  1  0  0  0  1  1  1  0  0
4  Item5  0  0  1  1  0  0  1  1  0
like image 167
Triky Avatar answered Sep 02 '25 16:09

Triky


You can try pd.get_dummies, e.g.,

pd.get_dummies(df.melt('Item'), columns=['value'], prefix='', prefix_sep='').drop('variable', axis=1).groupby('Item').sum().reset_index()

which gives

    Item  1  2  3  4  5  6  7  8  9
0  Item1  1  1  0  1  0  0  0  1  0
1  Item2  0  1  1  0  0  1  1  0  0
2  Item3  0  0  0  1  1  0  1  0  1
3  Item4  1  0  0  0  1  1  1  0  0
4  Item5  0  0  1  1  0  0  1  1  0

Data

df = pd.DataFrame(
    {
        "Item": ["Item1", "Item2", "Item3", "Item4", "Item5"],
        "N1": [1, 2, 4, 1, 3],
        "N2": [2, 3, 5, 5, 4],
        "N3": [4, 6, 7, 6, 7],
        "N4": [8, 7, 9, 7, 8],
    }
)
like image 42
ThomasIsCoding Avatar answered Sep 02 '25 17:09

ThomasIsCoding