I have the following dataframe:
+------------+------------------+
| item | categories |
+------------+------------------+
| blue_shirt | ['red', 'white'] |
+------------+------------------+
| red_skirt | ['blue', 'red'] |
+------------+------------------+
and I want to get this instead:
+------------+-----+-------+------+
| item | red | white | blue |
+------------+-----+-------+------+
| blue_shirt | 1 | 1 | 0 |
+------------+-----+-------+------+
| red_skirt | 1 | 0 | 1 |
+------------+-----+-------+------+
here is what I tried:
orders = orders.join(pd.get_dummies(orders['Categories'].explode()))
it creates the right columns however it creates (a lot) of additional rows too. I want one row in the end for each item like in the example above.
You can also solve this problem with a one-liner using .str method of pandas
df['categories'].str.join('|').str.get_dummies()
The format within each cell of 'categories' column needs to be a list. If it's a string of something else, you can make it a list with an .apply function. For example, if content of 'categories' column is a list saved as a string:
df['categories'].apply(lambda x: eval(x)).str.join('|').str.get_dummies()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With