Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to divide into a pandas multiple groupby?

Tags:

python

pandas

I have a dataframe of transactions and social group of the customer:

print(df.sample(10))


           Shop  Transaction_value Social Group
7           KFC                  7         Rich
22  Burger King                342         Rich
19  Burger King                  6         Rich
5           KFC                  2         Poor
14    McDonalds                245         Rich
2           KFC                  3         Poor
16    McDonalds                 56         Poor
6           KFC                  6         Poor
20  Burger King                 23         Poor
8           KFC                  5         Poor

I have done a groupby which tells me the most common social group for each shop:

(df.groupby(['Shop', 'Social Group'])['Transaction_value'].count())

Shop         Social Group
Burger King  Poor            7
             Rich            3
KFC          Poor            6
             Rich            3
McDonalds    Poor            3
             Rich            6

I want to divide the numbers above by the value_counts() of each social group:

df['Social Group'].value_counts()

Poor    16
Rich    12

So in my first groupby, wherever we have Poor I want to divide by 16. Wherever we have Rich I want to divide by 12.

So I'll have a dataframe like this:

Shop         Social Group
Burger King  Poor            0.43
             Rich            0.25
KFC          Poor            0.37
             Rich            0.37
McDonalds    Poor            0.25
             Rich            0.5

I have tried div() for this. I thought the index would match up with in each dataframe, but it doesn't work:

(df.groupby(['Shop', 'Social Group'])['Transaction_value']
 .count()
 .div(df['Social Group'].value_counts()))

ValueError: cannot join with no overlapping index names

Is what I am trying to do even possible with built in pandas functions?

I think I can do this with a for loop - but it will take a lot of time.

My df:

df.to_dict()

{'Shop': {0: 'KFC',
  1: 'KFC',
  2: 'KFC',
  3: 'KFC',
  4: 'KFC',
  5: 'KFC',
  6: 'KFC',
  7: 'KFC',
  8: 'KFC',
  9: 'McDonalds',
  10: 'McDonalds',
  11: 'McDonalds',
  12: 'McDonalds',
  13: 'McDonalds',
  14: 'McDonalds',
  15: 'McDonalds',
  16: 'McDonalds',
  17: 'McDonalds',
  18: 'Burger King',
  19: 'Burger King',
  20: 'Burger King',
  21: 'Burger King',
  22: 'Burger King',
  23: 'Burger King',
  24: 'Burger King',
  25: 'Burger King',
  26: 'Burger King',
  27: 'Burger King'},
 'Transaction_value': {0: 1,
  1: 2,
  2: 3,
  3: 34,
  4: 2,
  5: 2,
  6: 6,
  7: 7,
  8: 5,
  9: 4,
  10: 3,
  11: 2,
  12: 12,
  13: 31,
  14: 245,
  15: 123,
  16: 56,
  17: 67,
  18: 68,
  19: 6,
  20: 23,
  21: 44,
  22: 342,
  23: 234,
  24: 3,
  25: 234,
  26: 666,
  27: 88},
 'Social Group': {0: 'Poor',
  1: 'Rich',
  2: 'Poor',
  3: 'Poor',
  4: 'Rich',
  5: 'Poor',
  6: 'Poor',
  7: 'Rich',
  8: 'Poor',
  9: 'Rich',
  10: 'Rich',
  11: 'Rich',
  12: 'Rich',
  13: 'Rich',
  14: 'Rich',
  15: 'Poor',
  16: 'Poor',
  17: 'Poor',
  18: 'Poor',
  19: 'Rich',
  20: 'Poor',
  21: 'Poor',
  22: 'Rich',
  23: 'Poor',
  24: 'Poor',
  25: 'Rich',
  26: 'Poor',
  27: 'Poor'}}
like image 287
SCool Avatar asked Mar 03 '23 01:03

SCool


1 Answers

You are close, need level=1 for match second level of MultiIndex:

s = df['Social Group'].value_counts()
s1 = df.groupby(['Shop', 'Social Group'])['Transaction_value'].count().div(s, level=1)
print (s1)
Shop         Social Group
Burger King  Poor            0.4375
             Rich            0.2500
KFC          Poor            0.3750
             Rich            0.2500
McDonalds    Poor            0.1875
             Rich            0.5000
dtype: float64
like image 112
jezrael Avatar answered Mar 16 '23 22:03

jezrael