I have a data.table like the following:
Sim j active cost
1: 1 1 1 100
2: 1 2 1 125
3: 1 3 0 200
4: 1 4 1 250
5: 2 1 1 100
6: 2 2 0 50
7: 2 3 0 125
8: 2 4 1 200
dt <- data.table(Sim = c(1, 1, 1, 1, 2, 2, 2, 2),
j = c(1, 2, 3, 4, 1, 2, 3, 4),
active = c(1, 1, 0, 1, 1, 0, 0, 1),
cost = c(100, 125, 200, 250, 100, 50, 125, 200))
I want to add a column 'incr_cost' that subtracts the cost in each row i from the cost in a different row, which I'll call row k, where row k meets these conditions:
For rows where j=1, incr_cost can just be NA.
In my example, the solution would look like:
Sim j active cost incr_cost
1: 1 1 1 100 NA
2: 1 2 1 125 25
3: 1 3 0 200 75
4: 1 4 1 250 125
5: 2 1 1 100 NA
6: 2 2 0 50 -50
7: 2 3 0 125 25
8: 2 4 1 200 100
It seems like this is similar to applications of shift, except that instead of 'shifting' on the data.table as is, I want to shift on row-reduced data.table where rows not meeting my conditions are filtered out. I'm having a hard time understanding how to identify the row that has the largest j value that is less than my current row (and meets the other two conditions).
The following works except that it does not consider whether a row is active when selecting row k:
dt[, incr_cost := cost - shift(cost, fill=NA), by=Sim]
I am using r data.table, but non-data.table solutions are also welcome. Thank you!
You can use a rolling join:
dt[, v :=
cost - .SD[.(active = 1, Sim = Sim, j = j - 1), on=.(active, Sim, j), roll=TRUE, x.cost]]
Sim j active cost v
1: 1 1 1 100 NA
2: 1 2 1 125 25
3: 1 3 0 200 75
4: 1 4 1 250 125
5: 2 1 1 100 NA
6: 2 2 0 50 -50
7: 2 3 0 125 25
8: 2 4 1 200 100
This looks up the tuples .(active = 1, Sim = Sim, j = j - 1)
and when an exact match is not found, "rolls" to the last j
value that fits, if any.
How it works
In j
of x[i, j]
, .SD
is just a shorthand for the table itself, the "Subset of Data".
In j
of a join x[i, on=, roll=, j]
...
x.*
refers to columns of x
(here, .SD
); and similarly i.*
would be a prefix for columns of i
(here, the tuples). (OP's use of j
as a name might make this confusing. I mean j
, the argument in DT[i, j, ...]
.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With