I have a data.table
with ordered data labled up, and I want to add a column that tells me how many records until I get to a "special" record that resets the countdown.
For example:
DT = data.table(idx = c(1,3,3,4,6,7,7,8,9),
name = c("a", "a", "a", "b", "a", "a", "b", "a", "b"))
setkey(DT, idx)
#manually add the answer
DT[, countdown := c(3,2,1,0,2,1,0,1,0)]
Gives
> DT
idx name countdown
1: 1 a 3
2: 3 a 2
3: 3 a 1
4: 4 b 0
5: 6 a 2
6: 7 a 1
7: 7 b 0
8: 8 a 1
9: 9 b 0
See how the countdown column tells me how many rows until a row called "b". The question is how to create that column in code.
Note that the key is not evenly spaced and may contain duplicates (so is not very useful in solving the problem). In general the non-b names could be different, but I could add a dummy column that is just True/False if the solution requires this.
Here's another idea:
## Create groups that end at each occurrence of "b"
DT[, cd:=0L]
DT[name=="b", cd:=1L]
DT[, cd:=rev(cumsum(rev(cd)))]
## Count down within them
DT[, cd:=max(.I) - .I, by=cd]
# idx name cd
# 1: 1 a 3
# 2: 3 a 2
# 3: 3 a 1
# 4: 4 b 0
# 5: 6 a 2
# 6: 7 a 1
# 7: 7 b 0
# 8: 8 a 1
# 9: 9 b 0
I'm sure (or at least hopeful) that a purely "data.table" solution would be generated, but in the meantime, you could make use of rle
. In this case, you're interested in reversing the countdown, so we'll use rev
to reverse the "name" values before proceeding.
output <- sequence(rle(rev(DT$name))$lengths)
makezero <- cumsum(rle(rev(DT$name))$lengths)[c(TRUE, FALSE)]
output[makezero] <- 0
DT[, countdown := rev(output)]
DT
# idx name countdown
# 1: 1 a 3
# 2: 3 a 2
# 3: 3 a 1
# 4: 4 b 0
# 5: 6 a 2
# 6: 7 a 1
# 7: 7 b 0
# 8: 8 a 1
# 9: 9 b 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With