I have the two following input dataframes :
df_item_in_cat
a b c d
0 A A B A
1 B B E B
2 C E F E
3 D F G G
df_charac_by_cat
cat charac
0 a 10
1 b 20
2 c 25
3 d 15
And I would like to produce the below dataframe :
df
item cat_nb sum_charac
0 A 3 45
1 B 4 70
2 C 1 10
3 D 1 10
4 E 2 45
5 F 2 45
6 G 2 40
where :
If needed, I can get the Serie :
s_items
0 A
1 B
2 C
3 D
4 E
5 F
6 G
To be more concrete, I have products that belong to categories, and I have shops that are associated with a category. And I want to know in how many shops each product is available, based on the caterogy of each product, and the category of each shop. Hope it's clear ..
ex : product A that is in 3 categories (a,b,d), is available in 45 shops, because 10+20+15 = 45
It's like a "vlookup" in excel.
I'm pretty sure I could do it with an iterrows, but I'm looking for a "beautiful" solution.
I'd like to show you some researchs, but I really have no idea how to do that .. I tried with merges, or pivot, but it's not really usefull.
You need:
df_item_in_cat.melt().merge(df_charac_by_cat, left_on='variable', right_on='cat')\
.groupby('value')['charac'].agg(['count','sum'])
Output:
value count sum
0 A 3 45
1 B 4 70
2 C 1 10
3 D 1 10
4 E 3 60
5 F 2 45
6 G 2 40
And, this is with all the column renaming and "housekeeping":
df_item_in_cat.melt(value_name='item').merge(df_charac_by_cat, left_on='variable', right_on='cat')\
.groupby('item')['charac'].agg(['count','sum']).reset_index()\
.rename(columns={'count':'cat_nb','sum':'sum_charac'})
Output:
item cat_nb sum_charac
0 A 3 45
1 B 4 70
2 C 1 10
3 D 1 10
4 E 3 60
5 F 2 45
6 G 2 40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With