I have written some code to compute a weighted average using pivot tables in pandas. However, I am not sure how to add the actual column which performs the weighted averaging (Add a new column where each row contains value of 'cumulative'/'COUNT').
The data looks like so:
VALUE COUNT GRID agb
1 43 1476 1051
2 212 1476 2983
5 7 1477 890
4 1361 1477 2310
Here is my code:
# Read input data
lup_df = pandas.DataFrame.from_csv(o_dir+LUP+'.csv',index_col=False)
# Insert a new column with area * variable
lup_df['cumulative'] = lup_df['COUNT']*lup_df['agb']
# Create and output pivot table
lup_pvt = pandas.pivot_table(lup_df, 'agb', rows=['GRID'])
# TODO: Add a new column where each row contains value of 'cumulative'/'COUNT'
lup_pvt.to_csv(o_dir+PIVOT+'.csv',index=True,header=True,sep=',')
How can I do this?
So you want, for each value of grid
, the weighted average of the agb
column where the weights are the values in the count
column. If that interpretation is correct, I think this does the trick with groupby
:
import numpy as np
import pandas as pd
np.random.seed(0)
n = 50
df = pd.DataFrame({'count': np.random.choice(np.arange(10)+1, n),
'grid': np.random.choice(np.arange(10)+50, n),
'value': np.random.randn(n) + 12})
df['prod'] = df['count'] * df['value']
grouped = df.groupby('grid').sum()
grouped['wtdavg'] = grouped['prod'] / grouped['count']
print grouped
count value prod wtdavg
grid
50 22 57.177042 243.814417 11.082474
51 27 58.801386 318.644085 11.801633
52 11 34.202619 135.127942 12.284358
53 24 59.340084 272.836636 11.368193
54 39 137.268317 482.954857 12.383458
55 47 79.468986 531.122652 11.300482
56 17 38.624369 214.188938 12.599349
57 22 38.572429 279.948202 12.724918
58 27 36.492929 327.315518 12.122797
59 34 60.851671 408.306429 12.009013
Or, if you want to be a bit slick and write a weighted average function you can use over and over:
import numpy as np
import pandas as pd
np.random.seed(0)
n = 50
df = pd.DataFrame({'count': np.random.choice(np.arange(10)+1, n),
'grid': np.random.choice(np.arange(10)+50, n),
'value': np.random.randn(n) + 12})
def wavg(val_col_name, wt_col_name):
def inner(group):
return (group[val_col_name] * group[wt_col_name]).sum() / group[wt_col_name].sum()
inner.__name__ = 'wtd_avg'
return inner
slick = df.groupby('grid').apply(wavg('value', 'count'))
print slick
grid
50 11.082474
51 11.801633
52 12.284358
53 11.368193
54 12.383458
55 11.300482
56 12.599349
57 12.724918
58 12.122797
59 12.009013
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With