Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encode multiple label in DataFrame

Given a list of list, in which each sublist is a bucket filled with letters, like:

L=[['a','c'],['b','e'],['d']]

I would like to encode each sublist as one row in my DataFrame like this:

    a   b   c   d   e
0   1   0   1   0   0
1   0   1   0   0   1
2   0   0   0   1   0

Let's assume the letter is just from 'a' to 'e'. I am wondering how to complete a function to do so.

like image 343
Garvey Avatar asked May 26 '26 11:05

Garvey


1 Answers

You can use the sklearn library:

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

L = [['a', 'c'], ['b', 'e'], ['d']]

mlb = MultiLabelBinarizer()

res = pd.DataFrame(mlb.fit_transform(L),
                   columns=mlb.classes_)

print(res)

   a  b  c  d  e
0  1  0  1  0  0
1  0  1  0  0  1
2  0  0  0  1  0
like image 66
jpp Avatar answered May 30 '26 03:05

jpp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!