Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count the number of two observations binary combinations?

Tags:

r

count

In the example below, I would like the know the number of 010 sequences, or the number of 1010 sequences. Below is a workable example;

x <- c(1,0,0,1,0,0,0,1,1,1,0,0,1,0,1,0,1,0,1,0,1,0)

In this example, the number of 010 sequences would be 6 and the number of 1010 sequences would be 4.

What would be the most efficient/simplest way to count the number of consecutive sequences?

like image 205
Johnny Avatar asked Jan 09 '17 19:01

Johnny


1 Answers

A stringless way:

f = function(x, patt){
  if (length(x) == length(patt)) return(as.integer(x == patt))
  w = head(seq_along(x), 1L-length(patt))
  for (k in seq_along(patt)) w <- w[ x[w + k - 1L] == patt[k] ]
  w
}

length(f(x, patt = c(0,1,0))) # 6
length(f(x, patt = c(1,0,1,0))) # 4

Alternatives. From @cryo11, here's another way:

function(x,patt) sum(apply(embed(x,length(patt)),1,function(x) all(!xor(x,patt))))

or another variation:

function(x,patt) sum(!colSums( xor(patt, t(embed(x,length(patt)))) ))

or with data.table:

library(data.table)
setkey(setDT(shift(x, seq_along(patt), type = "lead")))[as.list(patt), .N]

(The shift function is very similar to embed.)

like image 172
Frank Avatar answered Oct 11 '22 03:10

Frank