Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas | list in column to binary column

I have the following dataframe:

+------------+------------------+
| item       | categories       |
+------------+------------------+
| blue_shirt | ['red', 'white'] |
+------------+------------------+
| red_skirt  | ['blue', 'red']  |
+------------+------------------+

and I want to get this instead:

+------------+-----+-------+------+
| item       | red | white | blue |
+------------+-----+-------+------+
| blue_shirt | 1   | 1     | 0    |
+------------+-----+-------+------+
| red_skirt  | 1   | 0     | 1    |
+------------+-----+-------+------+

here is what I tried:

orders = orders.join(pd.get_dummies(orders['Categories'].explode()))

it creates the right columns however it creates (a lot) of additional rows too. I want one row in the end for each item like in the example above.

like image 493
mokiliii Lo Avatar asked Oct 26 '25 12:10

mokiliii Lo


1 Answers

You can also solve this problem with a one-liner using .str method of pandas

df['categories'].str.join('|').str.get_dummies()

The format within each cell of 'categories' column needs to be a list. If it's a string of something else, you can make it a list with an .apply function. For example, if content of 'categories' column is a list saved as a string:

df['categories'].apply(lambda x: eval(x)).str.join('|').str.get_dummies()
like image 124
user3486942 Avatar answered Oct 29 '25 01:10

user3486942



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!