I have a dataframe and I am looking to calculate the mean based on store and all stores. I created code to calculate the mean but I am looking for a way that is more efficient.
DF
Cashier# Store# Sales Refunds
001 001 100 1
002 001 150 2
003 001 200 2
004 002 400 1
005 002 600 4
DF-Desired
Cashier# Store# Sales Refunds Sales_StoreAvg Sales_All_Stores_Avg
001 001 100 1 150 290
002 001 150 2 150 290
003 001 200 2 150 290
004 002 400 1 500 290
005 002 600 4 500 290
My Attempt I created two additional dataframes then did a left join
df.groupby(['Store#']).sum().reset_index().groupby('Sales').mean()
I think need GroupBy.transform
for new column filled by aggregate values with mean
:
df['Sales_StoreAvg'] = df.groupby('Store#')['Sales'].transform('mean')
df['Sales_All_Stores_Avg'] = df['Sales'].mean()
print (df)
Cashier# Store# Sales Refunds Sales_StoreAvg Sales_All_Stores_Avg
0 1 1 100 1 150 290.0
1 2 1 150 2 150 290.0
2 3 1 200 2 150 290.0
3 4 2 400 1 500 290.0
4 5 2 600 4 500 290.0
Use this, with transform
and assign
:
df.assign(Sales_StoreAvg = df.groupby('Store#')['Sales'].transform('mean'),
Sales_All_Stores_Avg = df['Sales'].mean()).astype(int)
Output:
Cashier# Store# Sales Refunds Sales_All_Stores_Avg Sales_StoreAvg
0 1 1 100 1 290 150
1 2 1 150 2 290 150
2 3 1 200 2 290 150
3 4 2 400 1 290 500
4 5 2 600 4 290 500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With