Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to decide the longest continuous sequence in a long sequence in r

Tags:

r

data.table

I have a sequence as the toy example. how can I decide the longest continuous sub-sequence? for now, i can find where the breaking points are, how can I get the values?

DT <- data.table(X = c(3:7, 16:18, 22:29, 31:36))
DT[,Y:=(shift(.SD,type = "lag", fill = -1))][,Y:= Y-X]
with(DT, which(Y !=-1)) 

what I hope to find is the value of the subquence, in this case, shall be c(22, 23, 24, 25, 26, 27, 28, 29)

like image 975
Grec001 Avatar asked Dec 18 '22 15:12

Grec001


1 Answers

Not sure what is your expected output but here we add length of each sequence in the data.table

library(data.table)
DT[, length := .N, by = cumsum(c(1, diff(X) != 1))]

DT
#     X length
# 1:  3      5
# 2:  4      5
# 3:  5      5
# 4:  6      5
# 5:  7      5
# 6: 16      3
# 7: 17      3
# 8: 18      3
# 9: 22      8
#10: 23      8
#11: 24      8
#12: 25      8
#13: 26      8
#14: 27      8
#15: 28      8
#16: 29      8
#17: 31      6
#18: 32      6
#19: 33      6
#20: 34      6
#21: 35      6
#22: 36      6
#     X length

and then if you want to extract only max values, we can do

DT[length == max(length), ]

#    X length
#1: 22      8
#2: 23      8
#3: 24      8
#4: 25      8
#5: 26      8
#6: 27      8
#7: 28      8
#8: 29      8
like image 95
Ronak Shah Avatar answered May 26 '23 07:05

Ronak Shah