When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this:
Item | N1 | N2 | N3 | N4 |
---|---|---|---|---|
Item1 | 1 | 2 | 4 | 8 |
Item2 | 2 | 3 | 6 | 7 |
Item3 | 4 | 5 | 7 | 9 |
Item4 | 1 | 5 | 6 | 7 |
Item5 | 3 | 4 | 7 | 8 |
I would like to one-hot encode this to get:
Item | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Item1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
Item2 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
Item3 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
Item4 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
Item5 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
Is this feasible at all? I am now in the process of coding some sort of loop to go through each line but I decided to ask if anyone knows a more efficient way to do this.
Use melt and crosstab.
tmp = df.melt('Item')
result = pd.crosstab(tmp['Item'], tmp['value']).reset_index().rename_axis(None, axis=1)
Item 1 2 3 4 5 6 7 8 9
0 Item1 1 1 0 1 0 0 0 1 0
1 Item2 0 1 1 0 0 1 1 0 0
2 Item3 0 0 0 1 1 0 1 0 1
3 Item4 1 0 0 0 1 1 1 0 0
4 Item5 0 0 1 1 0 0 1 1 0
You can try pd.get_dummies
, e.g.,
pd.get_dummies(df.melt('Item'), columns=['value'], prefix='', prefix_sep='').drop('variable', axis=1).groupby('Item').sum().reset_index()
which gives
Item 1 2 3 4 5 6 7 8 9
0 Item1 1 1 0 1 0 0 0 1 0
1 Item2 0 1 1 0 0 1 1 0 0
2 Item3 0 0 0 1 1 0 1 0 1
3 Item4 1 0 0 0 1 1 1 0 0
4 Item5 0 0 1 1 0 0 1 1 0
Data
df = pd.DataFrame(
{
"Item": ["Item1", "Item2", "Item3", "Item4", "Item5"],
"N1": [1, 2, 4, 1, 3],
"N2": [2, 3, 5, 5, 4],
"N3": [4, 6, 7, 6, 7],
"N4": [8, 7, 9, 7, 8],
}
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With