I just recently made the switch from R to python and have been having some trouble getting used to data frames again as opposed to using R's data.table. The problem I've been having is that I'd like to take a list of strings, check for a value, then sum the count of that string- broken down by user. So I would like to take this data:
A_id B C 1: a1 "up" 100 2: a2 "down" 102 3: a3 "up" 100 3: a3 "up" 250 4: a4 "left" 100 5: a5 "right" 102
And return:
A_id_grouped sum_up sum_down ... over_200_up 1: a1 1 0 ... 0 2: a2 0 1 0 3: a3 2 0 ... 1 4: a4 0 0 0 5: a5 0 0 ... 0
Before I did it with the R code (using data.table)
>DT[ ,list(A_id_grouped, sum_up = sum(B == "up"), + sum_down = sum(B == "down"), + ..., + over_200_up = sum(up == "up" & < 200), by=list(A)];
However all of my recent attempts with Python have failed me:
DT.agg({"D": [np.sum(DT[DT["B"]=="up"]),np.sum(DT[DT["B"]=="up"])], ... "C": np.sum(DT[(DT["B"]=="up") & (DT["C"]>200)]) })
Thank you in advance! it seems like a simple question however I couldn't find it anywhere.
To complement unutbu's answer, here's an approach using apply
on the groupby object.
>>> df.groupby('A_id').apply(lambda x: pd.Series(dict( sum_up=(x.B == 'up').sum(), sum_down=(x.B == 'down').sum(), over_200_up=((x.B == 'up') & (x.C > 200)).sum() ))) over_200_up sum_down sum_up A_id a1 0 0 1 a2 0 1 0 a3 1 0 2 a4 0 0 0 a5 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With