Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get dummy variables in Pandas where rows contain multiple variables as a list?

Consider a Pandas dataframe which has a column 'id', and the rows of this column consists of list of strings representing categories. What is an efficient way to obtain the dummy variables?

Example:

Input:

df1 = pd.DataFrame({'id': ['0,1', '24,25', '1,24']})

Output:

df2 = pd.DataFrame({'0':[1, 0, 0],
               '1': [1, 0, 1],
               '24':[0, 1, 1],
               '25':[0, 1, 0]})
like image 444
Shree Avatar asked Nov 18 '25 03:11

Shree


1 Answers

Use the .str accessor version of get_dummies:

df1['id'].str.get_dummies(sep=',')

The resulting output:

   0  1  24  25
0  1  1   0   0
1  0  0   1   1
2  0  1   1   0
like image 106
root Avatar answered Nov 20 '25 17:11

root



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!