In other words, I want to group on a column and then perform calculations using only some of the rows per group.
The data set I have is:
LoanRefId Tran_Type TransactionAmount
103 11 LoanIssue 1000.0000
104 11 InitiationFee 171.0000
105 11 Interest 59.6729
106 11 AdministrationFee 64.9332
107 11 RaisedClientInstallment 1295.5757
108 11 ClientInstallment 1295.4700
109 11 PaidUp 0.0000
110 11 Adjustment 0.1361
111 11 PaidUp 0.0000
112 12 LoanIssue 3000.0000
113 12 InitiationFee 399.0000
114 12 Interest 94.9858
115 12 AdministrationFee 38.6975
116 12 RaisedClientInstallment 3532.6350
117 12 ClientInstallment 3532.6100
118 12 PaidUp 0.0000
119 12 Adjustment 0.0733
120 12 PaidUp 0.0000
I would like to repeat the following calculation for each group:
ClientInstallment - LoanIssue.
So, group 1 will be for LoanRefId number 11. The calculation will take ClientInstallment of 1295.47 and subtract LoanIssue of 1000 to give me a new column, call it "Income, with value 295.47.
Is this possible using data.table or dplyr or any other clever tricks.
Alternatively I can create two data summaries, one for Clientinstallment and one for LoanIssue and then subtract them, but the truth is I need to do much more than just subtracting two numbers, so I would need a data summary for each calculation which is just plain unclever imho.
any help is appreciated
We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'LoanRefId', we get the corresponding 'TransactionAmount' for 'Tran_Type's 'ClientInstallment' and 'LoanIssue' and then subtract it.
library(data.table)
setDT(df1)[,list(Income =TransactionAmount[Tran_Type=='ClientInstallment']-
TransactionAmount[Tran_Type=='LoanIssue']), by = LoanRefId]
# LoanRefId Income
#1: 11 295.47
#2: 12 532.61
We can also use dplyr with similar appraoch
df1 %>%
group_by(LoanRefId) %>%
summarise(Income = TransactionAmount[Tran_Type=='ClientInstallment']-
TransactionAmount[Tran_Type=='LoanIssue'])
If we don't have a 'ClientInstallment' or 'LoanIssue' for a 'LoanRefId', we can use an if/else condition
setDT(df1)[, list(Income= if(any(Tran_Type=='ClientInstallment') &
any(Tran_Type=='LoanIssue'))
TransactionAmount[Tran_Type=='ClientInstallment'] -
TransactionAmount[Tran_Type=='LoanIssue'] else 0 ), by = LoanRefId]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With