I have the following 2 dataframes:
df1
product_ID tags
100 chocolate, sprinkles
101 chocolate, filled
102 glazed
df2
customer product_ID
A 100
A 101
B 101
C 100
C 102
B 101
A 100
C 102
I should be able to create a new dataframe like this.
| customer | chocolate | sprinkles | filled | glazed |
|----------|-----------|-----------|--------|--------|
| A | ? | ? | ? | ? |
| B | ? | ? | ? | ? |
| C | ? | ? | ? | ? |
Where the contents of cells represent the count of occurrences of product attribute.
I've used merge and got the following result
df3 = pd.merge(df2, df1)
df3.drop(['product'], axis = 1)
customer tags
A chocolate, sprinkles
C chocolate, sprinkles
A chocolate, sprinkles
A chocolate, filled
B chocolate, filled
B chocolate, filled
C glazed
C glazed
How do we get to the final result from here? Thanks in advance!
Using get_dummies
df.set_index('customer').tags.str.get_dummies(sep=',').sum(level=0)
Out[593]:
chocolate filled glazed sprinkles
customer
A 3 1 0 2
C 1 0 2 1
B 2 2 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With