Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame aggregate function using multiple columns

Tags:

python

pandas

Is there a way to write an aggregation function as is used in DataFrame.agg method, that would have access to more than one column of the data that is being aggregated? Typical use cases would be weighted average, weighted standard deviation funcs.

I would like to be able to write something like

def wAvg(c, w):     return ((c * w).sum() / w.sum())  df = DataFrame(....) # df has columns c and w, i want weighted average                      # of c using w as weight. df.aggregate ({"c": wAvg}) # and somehow tell it to use w column as weights ... 
like image 507
user1444817 Avatar asked Jun 08 '12 15:06

user1444817


People also ask

Can you use Groupby with multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

Which function is used to aggregate values from multiple columns in to one?

We can use the aggregate() function in R to produce summary statistics for one or more variables in a data frame.

Can pandas apply return multiple columns?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.


1 Answers

Yes; use the .apply(...) function, which will be called on each sub-DataFrame. For example:

grouped = df.groupby(keys)  def wavg(group):     d = group['data']     w = group['weights']     return (d * w).sum() / w.sum()  grouped.apply(wavg) 
like image 191
Wes McKinney Avatar answered Sep 19 '22 02:09

Wes McKinney