Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: aggregating multiple columns with multiple functions

Pandas in Python and Dplyr in R are both flexible data wrangling tools. For example, in R, with dplyr one can do the following;

custom_func <- function(col1, col2) length(col1) + length(col2)

ChickWeight %>% 
  group_by(Diet) %>% 
  summarise(m_weight = mean(weight), 
            var_time = var(Time), 
            covar = cov(weight, Time),
            odd_stat = custom_func(weight, Time))

Notice how in one statement;

  • I can aggregate over multiple columns in one line.
  • I can apply different functions over these multiple columns in one line.
  • I can use functions that take into account two columns.
  • I can throw in custom functions for any of these.
  • I can declare new column names for these aggregations.

Is such a pattern also possible in pandas? Note that I am interested in doing this in a short statement (so not creating three different dataframes and then joining them).

like image 279
cantdutchthis Avatar asked Mar 11 '16 10:03

cantdutchthis


People also ask

How do pandas use multiple aggregate functions?

To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.

How do you group by on multiple columns in pandas?

pandas GroupBy Multiple Columns Example You can do so by passing a list of column names to DataFrame. groupby() function.

How do I combine data from multiple columns into one pandas?

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

Can we group by 2 columns in pandas?

Grouping by multiple columns with multiple aggregations functions. Can you groupby your data set multiple columns in Pandas? You bet! Here's an example of multiple aggregations per grouping, each with their specific calculated function: a sum of the aggregating column and an average calculation.


1 Answers

With pandas groupby.apply() you can run multiple functions in a groupby aggregation. Please note for statistical functions you would need scipy installed. For custom functions will need to run an aggregate like sum() for groupwise data:

def customfct(x,y):
    data = x / y
    return data.mean()

def f(row):  
    row['m_weight'] = row['weight'].mean()
    row['var_time'] = row['Time'].var()
    row['cov'] = row['weight'].cov(row['Time'])
    row['odd_stat'] = customfct(row['weight'], row['Time'])
    return row

aggdf = df.groupby('Diet').apply(f)
like image 66
Parfait Avatar answered Nov 14 '22 21:11

Parfait