Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to do a vlookup in pandas with counts

I have the two following input dataframes :

df_item_in_cat
      a     b     c     d
0     A     A     B     A
1     B     B     E     B
2     C     E     F     E
3     D     F     G     G

df_charac_by_cat
    cat      charac
0     a          10
1     b          20
2     c          25
3     d          15

And I would like to produce the below dataframe :

df
   item   cat_nb   sum_charac
0     A        3           45
1     B        4           70
2     C        1           10
3     D        1           10
4     E        2           45
5     F        2           45
6     G        2           40

where :

  • cat_nb is the nb of columns where each item appear on df_item_in_cat
  • sum_charac is the sum of charac from df_charac_by_cat of the different categories

If needed, I can get the Serie :

s_items

0  A
1  B
2  C
3  D
4  E
5  F
6  G

To be more concrete, I have products that belong to categories, and I have shops that are associated with a category. And I want to know in how many shops each product is available, based on the caterogy of each product, and the category of each shop. Hope it's clear ..

  • (a,b,c,d) are categories
  • (A,B,C,D,E,F,G) are products
  • (10,20,25,15) are the number of shops associated for each category

ex : product A that is in 3 categories (a,b,d), is available in 45 shops, because 10+20+15 = 45

It's like a "vlookup" in excel.

I'm pretty sure I could do it with an iterrows, but I'm looking for a "beautiful" solution.

I'd like to show you some researchs, but I really have no idea how to do that .. I tried with merges, or pivot, but it's not really usefull.

like image 512
Matthieu Veron Avatar asked Jan 27 '23 15:01

Matthieu Veron


1 Answers

You need:

df_item_in_cat.melt().merge(df_charac_by_cat, left_on='variable', right_on='cat')\
   .groupby('value')['charac'].agg(['count','sum'])

Output:

  value  count  sum
0     A      3   45
1     B      4   70
2     C      1   10
3     D      1   10
4     E      3   60
5     F      2   45
6     G      2   40

And, this is with all the column renaming and "housekeeping":

df_item_in_cat.melt(value_name='item').merge(df_charac_by_cat, left_on='variable', right_on='cat')\
              .groupby('item')['charac'].agg(['count','sum']).reset_index()\
              .rename(columns={'count':'cat_nb','sum':'sum_charac'})

Output:

  item  cat_nb  sum_charac
0    A       3          45
1    B       4          70
2    C       1          10
3    D       1          10
4    E       3          60
5    F       2          45
6    G       2          40
like image 76
Scott Boston Avatar answered Feb 02 '23 00:02

Scott Boston