Is there a way to write an aggregation function as is used in DataFrame.agg
method, that would have access to more than one column of the data that is being aggregated? Typical use cases would be weighted average, weighted standard deviation funcs.
I would like to be able to write something like
def wAvg(c, w): return ((c * w).sum() / w.sum()) df = DataFrame(....) # df has columns c and w, i want weighted average # of c using w as weight. df.aggregate ({"c": wAvg}) # and somehow tell it to use w column as weights ...
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
We can use the aggregate() function in R to produce summary statistics for one or more variables in a data frame.
Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.
Yes; use the .apply(...)
function, which will be called on each sub-DataFrame
. For example:
grouped = df.groupby(keys) def wavg(group): d = group['data'] w = group['weights'] return (d * w).sum() / w.sum() grouped.apply(wavg)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With