groupby weighted average and sum in pandas dataframe

Tags:

I have a dataframe:

    Out[78]:     contract month year  buys  adjusted_lots    price 0         W     Z    5  Sell             -5   554.85 1         C     Z    5  Sell             -3   424.50 2         C     Z    5  Sell             -2   424.00 3         C     Z    5  Sell             -2   423.75 4         C     Z    5  Sell             -3   423.50 5         C     Z    5  Sell             -2   425.50 6         C     Z    5  Sell             -3   425.25 7         C     Z    5  Sell             -2   426.00 8         C     Z    5  Sell             -2   426.75 9        CC     U    5   Buy              5  3328.00 10       SB     V    5   Buy              5    11.65 11       SB     V    5   Buy              5    11.64 12       SB     V    5   Buy              2    11.60

I need a sum of adjusted_lots , price which is weighted average , of price and adjusted_lots , grouped by all the other columns , ie. grouped by (contract, month , year and buys)

Similar solution on R was achieved by following code, using dplyr, however unable to do the same in pandas.

> newdf = df %>%   select ( contract , month , year , buys , adjusted_lots , price ) %>%   group_by( contract , month , year ,  buys) %>%   summarise(qty = sum( adjusted_lots) , avgpx = weighted.mean(x = price , w = adjusted_lots) , comdty = "Comdty" )  > newdf Source: local data frame [4 x 6]    contract month year comdty qty     avgpx 1        C     Z    5 Comdty -19  424.8289 2       CC     U    5 Comdty   5 3328.0000 3       SB     V    5 Comdty  12   11.6375 4        W     Z    5 Comdty  -5  554.8500

is the same possible by groupby or any other solution ?

999

asked Jul 20 '15 15:07

samsri

2 Answers

EDIT: update aggregation so it works with recent version of pandas

To pass multiple functions to a groupby object, you need to pass a tuples with the aggregation functions and the column to which the function applies:

# Define a lambda function to compute the weighted mean: wm = lambda x: np.average(x, weights=df.loc[x.index, "adjusted_lots"])  # Define a dictionary with the functions to apply for a given column: # the following is deprecated since pandas 0.20: # f = {'adjusted_lots': ['sum'], 'price': {'weighted_mean' : wm} } # df.groupby(["contract", "month", "year", "buys"]).agg(f)  # Groupby and aggregate with namedAgg [1]: df.groupby(["contract", "month", "year", "buys"]).agg(adjusted_lots=("adjusted_lots", "sum"),                                                         price_weighted_mean=("price", wm))                            adjusted_lots  price_weighted_mean contract month year buys                                     C        Z     5    Sell            -19           424.828947 CC       U     5    Buy               5          3328.000000 SB       V     5    Buy              12            11.637500 W        Z     5    Sell             -5           554.850000

You can see more here:

http://pandas.pydata.org/pandas-docs/stable/groupby.html#applying-multiple-functions-at-once

and in a similar question here:

Apply multiple functions to multiple groupby columns

Hope this helps

[1] : https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html#groupby-aggregation-with-relabeling

157

answered Sep 23 '22 18:09

jrjc

Doing weighted average by groupby(...).apply(...) can be very slow (100x from the following). See my answer (and others) on this thread.

def weighted_average(df,data_col,weight_col,by_col):     df['_data_times_weight'] = df[data_col]*df[weight_col]     df['_weight_where_notnull'] = df[weight_col]*pd.notnull(df[data_col])     g = df.groupby(by_col)     result = g['_data_times_weight'].sum() / g['_weight_where_notnull'].sum()     del df['_data_times_weight'], df['_weight_where_notnull']     return result

answered Sep 23 '22 18:09

ErnestScribbler

Related questions
                            
                                Linear regression analysis with string/categorical features (variables)?
                            
                                Best Practices for Python Exceptions?
                            
                                Strange result when removing item from a list while iterating over it
                            
                                Unexpected Exception: name 'basestring' is not defined when invoking ansible2
                            
                                How to add and remove new layers in keras after loading weights?
                            
                                embedding short python scripts inside a bash script
                            
                                Check if any item in Python list is None (but include zero)
                            
                                Is there a php function like python's zip?
                            
                                Python socket connection timeout
                            
                                Remove non-numeric rows in one column with pandas
                            
                                Remove nodes from graph or reset entire default graph
                            
                                Python equivalent of npm or rubygems
                            
                                Copy all values in a column to a new column in a pandas dataframe
                            
                                Convert a string to integer with decimal in Python
                            
                                Splitting a string into words and punctuation
                            
                                Exponentials in python: x**y vs math.pow(x, y)
                            
                                No module named serial
                            
                                How can files be added to a tarfile with Python, without adding the directory hierarchy?
                            
                                Python pip broken after OS X 10.8 upgrade
                            
                                Link Conda environment with Jupyter Notebook

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

groupby weighted average and sum in pandas dataframe

Tags:

python

pandas

r

samsri

People also ask

2 Answers

jrjc

ErnestScribbler

Recent Activity

Donate For Us