Have a sequence of numbers.
seqNum <- sample(1:15, 30, replace = T)
[1] 10 7 6 5 4 1 15 11 7 15 1 2 3 14 8 3 5 10 8 3 14 8 14 3 14 12 15 12 10 14
Define "Loop": From the first number in the sequence, all numbers between two repeat number is defined as a loop. For example, the first loop in the given sequence above is "7 6 5 4 1 15 11". Remove it from the sequence then repeat the previous method to find next loop.
Sequence after remove the first loop became:
10 15 1 2 3 14 8 3 5 10 8 3 14 8 14 3 14 12 15 12 10 14
Second loop is "3 14 8".
Sequence after remove the second loop became:
10 15 1 2 5 10 8 3 14 8 14 3 14 12 15 12 10 14
Third loop is "10 15 1 2 5". Sequence after remove the third loop became:
8 3 14 8 14 3 14 12 15 12 10 14
Fourth loop is "8 3 14". Sequence after remove the fourth loop became:
14 3 14 12 15 12 10 14
Fifth loop is "14 12 15 12 10".(Always using 2 closest repeated number) Sequence after remove the fifth loop became:
14 3
DONE.
Note:
"loop" must contains at least 3 numbers
two loop consider to be the same loop if
Want:
Find all loops in the given sequence and give the count of each unique loop.
Desired result:
count loop
1 1 7-6-5-4-1-15-11
2 2 3-14-8
3 1 10-15-1-2-5
4 1 14-12-15-12-10
I understand this is a long question but I will appreciate any hints. Thank you!!
PS: This suppose to search loops on a very long sequence of numbers, say total around 10^8, please use sample(1:1024, 100000, replace = T) to test.
the following code will find and print all 'loops' working on the example as expected:
seqNum <- c(10, 7, 6, 5, 4, 1, 15, 11, 7, 15, 1, 2, 3, 14, 8, 3, 5, 10, 8, 3, 14, 8, 14, 3, 14, 12, 15, 12, 10, 14)
loops <- matrix(ncol = 2,nrow = 0, dimnames = list(numeric(0),c("count","loop")))
remove_loop <- function(seqNum) {
ht <- new.env()
for(j in 1:length(seqNum)) {
i <- seqNum[j]
key <- as.character(i)
if(exists(key,envir=ht)) {
lastIdx <- ht[[key]]
loop <- seqNum[lastIdx:(j-1)]
if(length(unique(loop)) > 2) {
return(list(loop=loop,newSeqNum=seqNum[-(lastIdx:j)]))
}
}
ht[[key]] <- j
}
}
newSeqNum <- seqNum
repeat{
l <- remove_loop(newSeqNum)
newSeqNum <- l$newSeqNum
if(length(l$loop)){
print(l$loop)
} else {
break
}
}
The output is
[1] 7 6 5 4 1 15 11
[1] 3 14 8
[1] 10 15 1 2 5
[1] 8 3 14
[1] 14 12 15 12 10
It works by repeatedly removing the next loop, i.e. calling the function remove_loop
on the remaining sequence. The function remove_loop
works as follows: Using an environment as a hash table (variable ht
) it keeps track of the last index each number was encountered at. If a number has been seen at least a second time (i.e. has an entry in the hash table) then a 'loop' may have been found. If the number has been seen more than two (say k) times it is true that the first (k-1) occurences did not meet the criteria for a 'loop'. Thus only the current and the last occurence are relevant. The current occurence is at position j
and the previous occurence can be found in the hash table. If this 'loop' contains at least three distinct number a 'loop' has been found.
Edit: The following code counts all the occurences of different loops. Loops are represented in a 'canonical' form by wrapping them around such that they start with the smallest element:
repr_loop <- function(l) {
idx <- which.min(l)
if(idx != 1) l <- c(l[idx:length(l)],l[1:(idx-1)])
paste0(l,collapse="-")
}
loops <- data.frame(count=numeric(),loop=character())
newSeqNum <- seqNum
repeat{
l <- remove_loop(newSeqNum)
newSeqNum <- l$newSeqNum
if(length(l$loop)){
s <- repr_loop(l$loop)
idx <- match(s,loops[,"loop"])
if(!is.na(idx)) {
loops[idx,"count"] <- loops[idx,"count"] + 1
} else {
loops <- rbind(loops,data.frame(count=1,loop=s))
}
} else {
break
}
}
loops
which yields the following output
> loops
count loop
1 1 1-15-11-7-6-5-4
2 2 3-14-8
3 1 1-2-5-10-15
4 1 10-14-12-15-12
You can try
Your data
d <- c(10,7,6,5,4,1,15,11,7,15,1,2,3,14,8,3,5,10,8,3,14,8,14,3,14,12,15,12,10,14)
I included everything in a function to easily use the code. The main idea is, to search for duplicates, check the length between the first duplicate and the first occurence of the number. If it is euqual or longer than three quit the loop and extract the segment, then update the vector and do all things as long as there are no further duplicates (while
). I have to say that there could be problems when two loop segments are occuring at the same time in the vector.
foo <- function(x){
d1 <- x
res <- NULL # vector for the results
while(any(duplicated(d1))){
gr <- which(duplicated(d1))
for(i in gr){
# here the magic happens
pos <- which(d1 == d1[i])
gr_pos <- which(diff(pos) >= 3)
pos <- pos[c(gr_pos,gr_pos+1)]
if(pos[2]-pos[1] >= 3) break
}
# extract the "loop" sequences
extract <- d1[seq(pos[1],pos[2])][-length(seq(pos[1],pos[2]))]
res <- append(res,paste(sort(extract), collapse = "-")) # save the loop
d1 <- d1[-seq(pos[1],pos[2])] # update input vector
if(length(d1) < 3) break # emergency stop
}
data.frame(table(res)) # output
}
foo(d)
res Freq
1 1-2-5-10-15 1
2 1-4-5-6-7-11-15 1
3 10-12-12-14-15 1
4 3-8-14 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With