pandas

Question

I have the following dataframe:

+------------+------------------+
| item       | categories       |
+------------+------------------+
| blue_shirt | ['red', 'white'] |
+------------+------------------+
| red_skirt  | ['blue', 'red']  |
+------------+------------------+

and I want to get this instead:

+------------+-----+-------+------+
| item       | red | white | blue |
+------------+-----+-------+------+
| blue_shirt | 1   | 1     | 0    |
+------------+-----+-------+------+
| red_skirt  | 1   | 0     | 1    |
+------------+-----+-------+------+

here is what I tried:

orders = orders.join(pd.get_dummies(orders['Categories'].explode()))

it creates the right columns however it creates (a lot) of additional rows too. I want one row in the end for each item like in the example above.

user3486942 · Accepted Answer

You can also solve this problem with a one-liner using .str method of pandas

df['categories'].str.join('|').str.get_dummies()

The format within each cell of 'categories' column needs to be a list. If it's a string of something else, you can make it a list with an .apply function. For example, if content of 'categories' column is a list saved as a string:

df['categories'].apply(lambda x: eval(x)).str.join('|').str.get_dummies()

pandas | list in column to binary column

Tags:

python

dataframe

mokiliii Lo

1 Answers

user3486942

Recent Activity

Donate For Us