Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum product and groupby

I have a dataframe that looks like this:

allHoldingsFund

      BrokerBestRate  notional_current  DistanceBestRate
0           CITI          7.859426e+05          0.023194
1           WFPBS         3.609674e+06         -0.023041
2           WFPBS         1.488828e+06         -0.023041
3           JPM           3.484168e+05         -0.106632
4           CITI          6.088499e+05          0.023194
5           WFPBS         8.665558e+06         -0.023041
6           WFPBS         4.219563e+05         -0.023041

I am trying to do a sum product and a group by in one go (without creating an extra column of sum product)

I have tried this line of code

allHoldingsFund.groupby(['BrokerBestRate'])['notional_current']*['DistanceBestRate'].sum() 

how can I do a sum product and then aggregate it using group by?

Desired output

BrokerBestRate      product of (notional_current  and DistanceBestRate)
   CITI              654654645665466
   JPM               453454534545367
  WFPBS              345345345345435

Many Thanks

like image 931
SBad Avatar asked Jun 13 '18 16:06

SBad


2 Answers

You can build the product column before the groupby

df.assign(col=df.notional_current*df.DistanceBestRate).groupby('BrokerBestRate',as_index=False).col.sum()
Out[372]: 
  BrokerBestRate            col
0           CITI   32350.817245
1            JPM  -37152.380218
2          WFPBS -326860.001568
like image 187
BENY Avatar answered Oct 13 '22 12:10

BENY


The simplest, but typically slowest, way would be to use apply:

In [43]: df.groupby("BrokerBestRate").apply(lambda x: x.prod(axis=1).sum())
Out[43]: 
BrokerBestRate
CITI      32350.817245
JPM      -37152.380218
WFPBS   -326860.001568
dtype: float64

But you can also compute the product column first, and then call groupby on that:

In [44]: df.eval("notional_current * DistanceBestRate").groupby(df.BrokerBestRate).sum()
Out[44]: 
BrokerBestRate
CITI      32350.817245
JPM      -37152.380218
WFPBS   -326860.001568
dtype: float64

In [45]: df[["notional_current", "DistanceBestRate"]].prod(axis=1).groupby(df["BrokerBestRate"]).sum()
Out[45]: 
BrokerBestRate
CITI      32350.817245
JPM      -37152.380218
WFPBS   -326860.001568
dtype: float64
like image 6
DSM Avatar answered Oct 13 '22 14:10

DSM