I'm trying to add new columns to data.table
, where values in rows depend on the relative relationship of the values in the row. To be more precise, if there is a value X in a row, I would like to know how many other values are in the same column (and group), that are within X-30.
That is, given this:
DT<-data.table(
X = c(1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1),
Y = c(100, 101, 133, 134, 150, 156, 190, 200, 201, 230, 233, 234),
Z = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))
I would like to get a new column, with values:
N <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 2)
I've tried the following, but I don't get the results I could use:
DT[,list(Y,num=cumsum(Y[-.I]>DT[.I,Y]-30),Z),by=.(X)]
Any ideas how to do this?
This is probably can be achieved with a rolling join (?), but here is a foverlaps
alternative for now
DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] # Add row index and a -30 interval
setkey(DT, X, Y2, Y) # Sort by X and the intervals (for fovelaps)
res <- foverlaps(DT, DT)[Y2 > i.Y2, .N, keyby = indx] # Run foverlaps and check what can we catch
setorder(DT, indx) # go back to the original order
DT[res$indx, N := res$N][, c("indx", "Y2") := NULL] # update results and remove cols
DT
# X Y Z N
# 1: 1 100 1 0
# 2: 2 101 2 0
# 3: 2 133 3 0
# 4: 1 134 4 0
# 5: 1 150 5 1
# 6: 2 156 6 1
# 7: 1 190 7 0
# 8: 2 200 8 0
# 9: 2 201 9 1
# 10: 1 230 10 0
# 11: 1 233 11 1
# 12: 1 234 12 2
Alternately, use the which=TRUE
option of foverlaps
to make the overlap merge smaller:
# as above
DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)]
setkey(DT, X, Y2, Y)
# using which=TRUE:
res <- foverlaps(DT, DT, which=TRUE)[xid > yid, .N, by=xid]
DT[res$xid, N := res$N]
setorder(DT, indx)
DT[, c("Y2","indx") := NULL]
Here's another way:
DT[order(Y), N := 0:(.N-1) - findInterval(Y - 30, Y), by = X]
all.equal(DT$N,N) # TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With