Is it possible to count a repeating part of a sequence in R? For example:
x<- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2,
3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4)
Is it possible to count the times that the subsequence 3.0,3.1,3.2 occurs? So in this example it must be: 4
I'd do something like this:
pattern <- c(3, 3.1, 3.2)
len1 <- seq_len(length(x) - length(pattern) + 1)
len2 <- seq_len(length(pattern))-1
sum(colSums(matrix(x[outer(len1, len2, '+')],
ncol=length(len1), byrow=TRUE) == pattern) == length(len2))
PS: by changing sum
to which
you'll get the start of each instance.
One more (generic moving window) approach:
x <- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2, 3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4)
s <- c(3, 3.1, 3.2)
sum(apply(embed(x, length(s)), 1, function(y) {all(y == rev(s))}))
# [1] 4
See output of embed
to understand what's happening.
As Arun points out apply
here is pretty slow, and one can use embed
together with Arun's matrix
trick to get this to be a lot faster:
sum(colSums(matrix(embed(x, length(s)),
byrow = TRUE, nrow = length(s)) == rev(s)) == length(s))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With