I have a sequence as the toy example. how can I decide the longest continuous sub-sequence? for now, i can find where the breaking points are, how can I get the values?
DT <- data.table(X = c(3:7, 16:18, 22:29, 31:36))
DT[,Y:=(shift(.SD,type = "lag", fill = -1))][,Y:= Y-X]
with(DT, which(Y !=-1))
what I hope to find is the value of the subquence, in this case, shall be c(22, 23, 24, 25, 26, 27, 28, 29)
Not sure what is your expected output but here we add length of each sequence in the data.table
library(data.table)
DT[, length := .N, by = cumsum(c(1, diff(X) != 1))]
DT
# X length
# 1: 3 5
# 2: 4 5
# 3: 5 5
# 4: 6 5
# 5: 7 5
# 6: 16 3
# 7: 17 3
# 8: 18 3
# 9: 22 8
#10: 23 8
#11: 24 8
#12: 25 8
#13: 26 8
#14: 27 8
#15: 28 8
#16: 29 8
#17: 31 6
#18: 32 6
#19: 33 6
#20: 34 6
#21: 35 6
#22: 36 6
# X length
and then if you want to extract only max values, we can do
DT[length == max(length), ]
# X length
#1: 22 8
#2: 23 8
#3: 24 8
#4: 25 8
#5: 26 8
#6: 27 8
#7: 28 8
#8: 29 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With