Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A better way to aggregate data and keep table structure and column names with Pandas

Tags:

python

pandas

Suppose I have a dataset like the following

df = pd.DataFrame({'x1':['a','a','b','b'], 'x2':[True, True, True, False], 'x3':[1,1,1,1]})
df
  x1     x2  x3
0  a   True   1
1  a   True   1
2  b   True   1
3  b  False   1

I often want to perform a groupby-aggregate operation where I group by multiple columns and apply multiple functions to one column. Furthermore, I usually don't want a multi-indexed, multi-level table. To accomplish this, it's taking me three lines of code which seems excessive.

For example

bg = df.groupby(['x1', 'x2']).agg({'x3': {'my_sum':np.sum, 'my_mean':np.mean}})
bg.columns = bg.columns.droplevel(0)
bg.reset_index()

Is there a better way? Not to gripe, but I'm coming from an R/data.table background where something like this is a nice one-liner like

df[, list(my_sum=sum(x3), my_mean=mean(x3)), by=list(x1, x2)]
like image 293
Ben Avatar asked Jan 06 '16 03:01

Ben


1 Answers

How about this:

In [81]: bg = df.groupby(['x1', 'x2'], as_index=False)['x3'].agg({'my_sum':np.sum, 'my_mean':np.mean})

In [82]: print bg
  x1     x2  my_sum  my_mean
0  a   True       2        1
1  b  False       1        1
2  b   True       1        1
like image 189
Happy001 Avatar answered Oct 01 '22 21:10

Happy001