Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find distribution of consecutive zeros

Tags:

r

I have a vector, say x which contains only the integer numbers 0,1 and 2. For example;

x <- c(0,1,0,2,0,0,1,0,0,1,0,0,0,1,0)

From this I would like to extract how many times zero occurs in each "pattern". In this simple example it occurs three times on it own, twice as 00 and exactly once as 000, so I would like to output something like:

0      3
00     2
000    1

My actual dataset is quite large (1000-2000 elements in the vector) and at least in theory the maximum number of consecutive zeros is length(x)

like image 424
Robert Long Avatar asked Apr 11 '18 10:04

Robert Long


People also ask

How do you find consecutive zeros?

We can see after length 12 sequence is repeating and in lengths of 12. And in a segment of length 12, there are total 2 pairs of consecutive zeros. Hence we can generalize the given pattern q = (2^n/12) and total pairs of consecutive zeros will be 2*q+1.

What is meant by consecutive zeroes?

Expert-verified answer [It means all numbers from 1 to 100 multiplied] and they ask you the number of zeroes. So, to solve these questions faster, you must know that every zero in a number is due to multiplication of one 5 and one 2.


1 Answers

1) rle Use rle and table like this. No packages are needed.

tab <- with(rle(x), table(lengths[values == 0]))

giving:

> tab
1 2 3 
3 2 1 

or

> as.data.frame(tab)
  Var1 Freq
1    1    3
2    2    2
3    3    1

That is, there are 3 runs of one zero, 2 runs of two zeros and 1 run of three zeros.

The output format in the question is not really feasible if there are very long runs but just for fun here it is:

data.frame(Sequence = strrep(0, names(tab)), Freq = as.numeric(tab))

giving:

  Sequence Freq
1        0    3
2       00    2
3      000    1

2) gregexpr Another possibility is to use a regular expression:

tab2 <- table(attr(gregexpr("0+", paste(x, collapse = ""))[[1]], "match.length"))

giving:

> tab2
1 2 3 
3 2 1 

Other output formats could be derived as in (1).

Note

I checked the speed with a length(x) of 2000 and (1) took about 1.6 ms on my laptop and (2) took about 9 ms.

like image 81
G. Grothendieck Avatar answered Oct 18 '22 08:10

G. Grothendieck