Is it possible to count a repeating part of a sequence in R? For example:
x<- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2,
3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4)
Is it possible to count the times that the subsequence 3.0,3.1,3.2 occurs? So in this example it must be: 4
I'd do something like this:
pattern <- c(3, 3.1, 3.2)
len1 <- seq_len(length(x) - length(pattern) + 1)
len2 <- seq_len(length(pattern))-1
sum(colSums(matrix(x[outer(len1, len2, '+')],
ncol=length(len1), byrow=TRUE) == pattern) == length(len2))
PS: by changing sum to which you'll get the start of each instance.
One more (generic moving window) approach:
x <- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2, 3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4)
s <- c(3, 3.1, 3.2)
sum(apply(embed(x, length(s)), 1, function(y) {all(y == rev(s))}))
# [1] 4
See output of embed to understand what's happening.
As Arun points out apply here is pretty slow, and one can use embed together with Arun's matrix trick to get this to be a lot faster:
sum(colSums(matrix(embed(x, length(s)),
byrow = TRUE, nrow = length(s)) == rev(s)) == length(s))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With