I have following DT (data.table) in R.
dt <- fread("
id| rowids | charge | payment | balance
a | 1 | 7.1 | 0 |
a | 2 | 1.2 | 3 |
a | 3 | 1.7 | 1 |
b | 1 | 8.1 | 0 |
b | 2 | 2.5 | 4 |
b | 3 | 2.3 | 2 |
b | 4 | 3.2 | 1 |
",
sep = "|",
colClasses = c("character", "numeric", "numeric", "numeric",
"numeric"))
The "balance" is should be computed, within each id group, as "balance <- previous.row.balance + charge - payment", where the "previous.row.balance" is the previous row entry of "balance".
I initially underestimate the difficulty to compute the running balance. I was thinking about dt[,previous.row.balance := (shift(balance,1),by=id]
. But R does vectorized computation. I did not have values in "balance" available for me to perform shift() since "balance" will be computed through row-by-row iteration.
I searched on StackOverflow and found a similar question and its first answer greatly helped me to think through the whole process. I adapted the code in the first answer to my problem and got the following code working wonderfully to generate the running balance by group.
dt[rowids == 1, balance := charge, by=.(id)]
dt[rowids != 1, balance :=
dt[,
{
balance1 <- balance[1L]
.SD[rowids != 1,
{balance1 <- balance1 + charge - payment
.(balance1)
},
by=.(rowids)]
},
by=.(id)][, -1L:-2L]
]
Here are my questions.
by=.(id)][, -1L:-2L]
, the chained brackets worked the iteration out. Since the code does not employ shift() by = group
, I guess [, -1L:-2L]
does the trick here to perform the iteration. But how? What does [, -1L:-2L]
actually do here? Sorry that I have to ask this question here, instead of commenting or asking under that question . The reason is that I am brand new to StackOverflow with only 1 point of reputation. I am not allowed to comment on the original answer to that question. I also would like to vote up for that answer. Before I can do that, I have to earn more points.
Any insight or thought is appreciated!
Regarding your question #2:
You can use the cumsum
function (output matches that of the code in the question). This will take the value of charge - payment
for the first row, then for the second the second charge - payment
will be added to that, et cetera.
dt[, balance2 := cumsum(charge - payment), id]
dt
# id rowids charge payment balance balance2
# 1: a 1 7.1 0 7.1 7.1
# 2: a 2 1.2 3 5.3 5.3
# 3: a 3 1.7 1 6.0 6.0
# 4: b 1 8.1 0 8.1 8.1
# 5: b 2 2.5 4 6.6 6.6
# 6: b 3 2.3 2 6.9 6.9
# 7: b 4 3.2 1 9.1 9.1
Since @IceCreamToucan has answered part 2 (how to improve the code), I'll just cover part 1 (why x[, -1:-2]
works). From ?data.table
, we know that in general the j
field can be used to select columns:
When
j
is a vector of column names or positions to select (as in data.frame) [, then it behaves as with a data.frame].
(The words in brackets are my edit to complete the sentence.)
In particular, when j
takes the form n:m
, ...
You would also see this behavior with j
set to -c(1,2)
or !c(1,2)
or !(1:2)
or -(1:2)
.
This behavior is based on special parsing of j
to check for :
or !
or -
being the top-level function.
Next, it is important to know that the columns in by=
are put as the first columns in the table.
Combining these two points in the OP's example, you have by=id
as the first column (the outer by) and by=rowids
as the second column (the inner by). After these are dropped with [, -1L:-2L]
you have the .(balance1)
expression remaining.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With