Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R perform calculations on groups for subset of rows

In other words, I want to group on a column and then perform calculations using only some of the rows per group.

The data set I have is:

    LoanRefId               Tran_Type TransactionAmount
103        11               LoanIssue         1000.0000
104        11           InitiationFee          171.0000
105        11                Interest           59.6729
106        11       AdministrationFee           64.9332
107        11 RaisedClientInstallment         1295.5757
108        11       ClientInstallment         1295.4700
109        11                  PaidUp            0.0000
110        11              Adjustment            0.1361
111        11                  PaidUp            0.0000
112        12               LoanIssue         3000.0000
113        12           InitiationFee          399.0000
114        12                Interest           94.9858
115        12       AdministrationFee           38.6975
116        12 RaisedClientInstallment         3532.6350
117        12       ClientInstallment         3532.6100
118        12                  PaidUp            0.0000
119        12              Adjustment            0.0733
120        12                  PaidUp            0.0000

I would like to repeat the following calculation for each group: ClientInstallment - LoanIssue.

So, group 1 will be for LoanRefId number 11. The calculation will take ClientInstallment of 1295.47 and subtract LoanIssue of 1000 to give me a new column, call it "Income, with value 295.47.

Is this possible using data.table or dplyr or any other clever tricks.

Alternatively I can create two data summaries, one for Clientinstallment and one for LoanIssue and then subtract them, but the truth is I need to do much more than just subtracting two numbers, so I would need a data summary for each calculation which is just plain unclever imho.

any help is appreciated

like image 541
gmarais Avatar asked Oct 30 '25 03:10

gmarais


1 Answers

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'LoanRefId', we get the corresponding 'TransactionAmount' for 'Tran_Type's 'ClientInstallment' and 'LoanIssue' and then subtract it.

library(data.table)
setDT(df1)[,list(Income =TransactionAmount[Tran_Type=='ClientInstallment']-
       TransactionAmount[Tran_Type=='LoanIssue']), by = LoanRefId]
#  LoanRefId Income
#1:        11 295.47
#2:        12 532.61

We can also use dplyr with similar appraoch

df1 %>%
    group_by(LoanRefId) %>%
    summarise(Income = TransactionAmount[Tran_Type=='ClientInstallment']-
                       TransactionAmount[Tran_Type=='LoanIssue'])

Update

If we don't have a 'ClientInstallment' or 'LoanIssue' for a 'LoanRefId', we can use an if/else condition

setDT(df1)[, list(Income= if(any(Tran_Type=='ClientInstallment') & 
              any(Tran_Type=='LoanIssue')) 
            TransactionAmount[Tran_Type=='ClientInstallment'] - 
            TransactionAmount[Tran_Type=='LoanIssue'] else 0 ), by = LoanRefId]
like image 91
akrun Avatar answered Nov 01 '25 17:11

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!