Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count a repeating repeating part of a sequence in R?

Is it possible to count a repeating part of a sequence in R? For example:

x<- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2,
      3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4)

Is it possible to count the times that the subsequence 3.0,3.1,3.2 occurs? So in this example it must be: 4

like image 900
user2531964 Avatar asked Dec 16 '22 09:12

user2531964


2 Answers

I'd do something like this:

pattern <- c(3, 3.1, 3.2)
len1 <- seq_len(length(x) - length(pattern) + 1)
len2 <- seq_len(length(pattern))-1
sum(colSums(matrix(x[outer(len1, len2, '+')], 
     ncol=length(len1), byrow=TRUE) == pattern) == length(len2))

PS: by changing sum to which you'll get the start of each instance.

like image 196
Arun Avatar answered Dec 28 '22 23:12

Arun


One more (generic moving window) approach:

x <- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2, 3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4)
s <- c(3, 3.1, 3.2)

sum(apply(embed(x, length(s)), 1, function(y) {all(y == rev(s))}))
# [1] 4

See output of embed to understand what's happening.

As Arun points out apply here is pretty slow, and one can use embed together with Arun's matrix trick to get this to be a lot faster:

sum(colSums(matrix(embed(x, length(s)),
                   byrow = TRUE, nrow = length(s)) == rev(s)) == length(s))
like image 45
eddi Avatar answered Dec 28 '22 22:12

eddi