Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conditional sums for pandas aggregate

Tags:

I just recently made the switch from R to python and have been having some trouble getting used to data frames again as opposed to using R's data.table. The problem I've been having is that I'd like to take a list of strings, check for a value, then sum the count of that string- broken down by user. So I would like to take this data:

   A_id       B    C 1:   a1    "up"  100 2:   a2  "down"  102 3:   a3    "up"  100 3:   a3    "up"  250 4:   a4  "left"  100 5:   a5 "right"  102 

And return:

   A_id_grouped   sum_up   sum_down  ...  over_200_up 1:           a1        1          0  ...            0 2:           a2        0          1                 0 3:           a3        2          0  ...            1 4:           a4        0          0                 0 5:           a5        0          0  ...            0 

Before I did it with the R code (using data.table)

>DT[ ,list(A_id_grouped, sum_up = sum(B == "up"), +  sum_down = sum(B == "down"),  +  ..., +  over_200_up = sum(up == "up" & < 200), by=list(A)]; 

However all of my recent attempts with Python have failed me:

DT.agg({"D": [np.sum(DT[DT["B"]=="up"]),np.sum(DT[DT["B"]=="up"])], ...     "C": np.sum(DT[(DT["B"]=="up") & (DT["C"]>200)])     }) 

Thank you in advance! it seems like a simple question however I couldn't find it anywhere.

like image 910
stites Avatar asked Mar 06 '13 22:03

stites


Video Answer


1 Answers

To complement unutbu's answer, here's an approach using apply on the groupby object.

>>> df.groupby('A_id').apply(lambda x: pd.Series(dict(     sum_up=(x.B == 'up').sum(),     sum_down=(x.B == 'down').sum(),     over_200_up=((x.B == 'up') & (x.C > 200)).sum() )))       over_200_up  sum_down  sum_up A_id                                a1              0         0       1 a2              0         1       0 a3              1         0       2 a4              0         0       0 a5              0         0       0 
like image 52
Garrett Avatar answered Nov 02 '22 00:11

Garrett