I have a dataframe of transactions and social group of the customer:
print(df.sample(10))
Shop Transaction_value Social Group
7 KFC 7 Rich
22 Burger King 342 Rich
19 Burger King 6 Rich
5 KFC 2 Poor
14 McDonalds 245 Rich
2 KFC 3 Poor
16 McDonalds 56 Poor
6 KFC 6 Poor
20 Burger King 23 Poor
8 KFC 5 Poor
I have done a groupby which tells me the most common social group for each shop:
(df.groupby(['Shop', 'Social Group'])['Transaction_value'].count())
Shop Social Group
Burger King Poor 7
Rich 3
KFC Poor 6
Rich 3
McDonalds Poor 3
Rich 6
I want to divide the numbers above by the value_counts()
of each social group:
df['Social Group'].value_counts()
Poor 16
Rich 12
So in my first groupby, wherever we have Poor
I want to divide by 16. Wherever we have Rich
I want to divide by 12.
So I'll have a dataframe like this:
Shop Social Group
Burger King Poor 0.43
Rich 0.25
KFC Poor 0.37
Rich 0.37
McDonalds Poor 0.25
Rich 0.5
I have tried div()
for this. I thought the index would match up with in each dataframe, but it doesn't work:
(df.groupby(['Shop', 'Social Group'])['Transaction_value']
.count()
.div(df['Social Group'].value_counts()))
ValueError: cannot join with no overlapping index names
Is what I am trying to do even possible with built in pandas functions?
I think I can do this with a for loop - but it will take a lot of time.
My df:
df.to_dict()
{'Shop': {0: 'KFC',
1: 'KFC',
2: 'KFC',
3: 'KFC',
4: 'KFC',
5: 'KFC',
6: 'KFC',
7: 'KFC',
8: 'KFC',
9: 'McDonalds',
10: 'McDonalds',
11: 'McDonalds',
12: 'McDonalds',
13: 'McDonalds',
14: 'McDonalds',
15: 'McDonalds',
16: 'McDonalds',
17: 'McDonalds',
18: 'Burger King',
19: 'Burger King',
20: 'Burger King',
21: 'Burger King',
22: 'Burger King',
23: 'Burger King',
24: 'Burger King',
25: 'Burger King',
26: 'Burger King',
27: 'Burger King'},
'Transaction_value': {0: 1,
1: 2,
2: 3,
3: 34,
4: 2,
5: 2,
6: 6,
7: 7,
8: 5,
9: 4,
10: 3,
11: 2,
12: 12,
13: 31,
14: 245,
15: 123,
16: 56,
17: 67,
18: 68,
19: 6,
20: 23,
21: 44,
22: 342,
23: 234,
24: 3,
25: 234,
26: 666,
27: 88},
'Social Group': {0: 'Poor',
1: 'Rich',
2: 'Poor',
3: 'Poor',
4: 'Rich',
5: 'Poor',
6: 'Poor',
7: 'Rich',
8: 'Poor',
9: 'Rich',
10: 'Rich',
11: 'Rich',
12: 'Rich',
13: 'Rich',
14: 'Rich',
15: 'Poor',
16: 'Poor',
17: 'Poor',
18: 'Poor',
19: 'Rich',
20: 'Poor',
21: 'Poor',
22: 'Rich',
23: 'Poor',
24: 'Poor',
25: 'Rich',
26: 'Poor',
27: 'Poor'}}
You are close, need level=1
for match second level of MultiIndex
:
s = df['Social Group'].value_counts()
s1 = df.groupby(['Shop', 'Social Group'])['Transaction_value'].count().div(s, level=1)
print (s1)
Shop Social Group
Burger King Poor 0.4375
Rich 0.2500
KFC Poor 0.3750
Rich 0.2500
McDonalds Poor 0.1875
Rich 0.5000
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With