I have a vector, say x
which contains only the integer numbers 0
,1
and 2
. For example;
x <- c(0,1,0,2,0,0,1,0,0,1,0,0,0,1,0)
From this I would like to extract how many times zero occurs in each "pattern". In this simple example it occurs three times on it own, twice as 00
and exactly once as 000
, so I would like to output something like:
0 3
00 2
000 1
My actual dataset is quite large (1000-2000 elements in the vector) and at least in theory the maximum number of consecutive zeros is length(x)
We can see after length 12 sequence is repeating and in lengths of 12. And in a segment of length 12, there are total 2 pairs of consecutive zeros. Hence we can generalize the given pattern q = (2^n/12) and total pairs of consecutive zeros will be 2*q+1.
Expert-verified answer [It means all numbers from 1 to 100 multiplied] and they ask you the number of zeroes. So, to solve these questions faster, you must know that every zero in a number is due to multiplication of one 5 and one 2.
1) rle Use rle
and table
like this. No packages are needed.
tab <- with(rle(x), table(lengths[values == 0]))
giving:
> tab
1 2 3
3 2 1
or
> as.data.frame(tab)
Var1 Freq
1 1 3
2 2 2
3 3 1
That is, there are 3 runs of one zero, 2 runs of two zeros and 1 run of three zeros.
The output format in the question is not really feasible if there are very long runs but just for fun here it is:
data.frame(Sequence = strrep(0, names(tab)), Freq = as.numeric(tab))
giving:
Sequence Freq
1 0 3
2 00 2
3 000 1
2) gregexpr Another possibility is to use a regular expression:
tab2 <- table(attr(gregexpr("0+", paste(x, collapse = ""))[[1]], "match.length"))
giving:
> tab2
1 2 3
3 2 1
Other output formats could be derived as in (1).
I checked the speed with a length(x)
of 2000 and (1) took about 1.6 ms on my laptop and (2) took about 9 ms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With