Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert list to binary values using one-hot encoding

I have one column in CSV file. Each cell in the column has multiple values in a list. For e.g. one cell would contain ['A', 'B', 'C'] and the other ['B', 'D']. I want to apply one-hot encoding to this column to convert to binary values to use for machine learning.

Please let me know how I can do that?

like image 941
Raghav Avatar asked Feb 26 '26 21:02

Raghav


1 Answers

Input is csv file, so there are no lists but strings. So remove [] and use Series.str.get_dummies along with removing trailing ' in column names:

df = df['col'].str.strip('[]').str.get_dummies(', ')
df.columns = df.columns.str.strip("'")

If there is some processing required to convert strings to lists use MultiLabelBinarizer for improved performance:

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(df['col']),columns=mlb.classes_)
print (df)
like image 103
jezrael Avatar answered Feb 28 '26 11:02

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!