I have a vector, say <code>x</code> which contains only the integer numbers <code>0</code>,<code>1</code> and <code>2</code>. For example; <pre class="prettyprint"><code>x <- c(0,1,0,2,0,0,1,0,0,1,0,0,0,1,0) </code></pre> From this I would like to extract how many times zero occurs in each "pattern". In this simple example it occurs three times on it own, twice as <code>00</code> and exactly once as <code>000</code>, so I would like to output something like: <pre class="prettyprint"><code>0 3 00 2 000 1 </code></pre> My actual dataset is quite large (1000-2000 elements in the vector) and at least in theory the maximum number of consecutive zeros is <code>length(x)</code>

1) rle Use <code>rle</code> and <code>table</code> like this. No packages are needed. <pre class="prettyprint"><code>tab <- with(rle(x), table(lengths[values == 0])) </code></pre> giving: <pre class="prettyprint"><code>> tab 1 2 3 3 2 1 </code></pre> or <pre class="prettyprint"><code>> as.data.frame(tab) Var1 Freq 1 1 3 2 2 2 3 3 1 </code></pre> That is, there are 3 runs of one zero, 2 runs of two zeros and 1 run of three zeros. The output format in the question is not really feasible if there are very long runs but just for fun here it is: <pre class="prettyprint"><code>data.frame(Sequence = strrep(0, names(tab)), Freq = as.numeric(tab)) </code></pre> giving: <pre class="prettyprint"><code> Sequence Freq 1 0 3 2 00 2 3 000 1 </code></pre> 2) gregexpr Another possibility is to use a regular expression: <pre class="prettyprint"><code>tab2 <- table(attr(gregexpr("0+", paste(x, collapse = ""))[[1]], "match.length")) </code></pre> giving: <pre class="prettyprint"><code>> tab2 1 2 3 3 2 1 </code></pre> Other output formats could be derived as in (1). <h3>Note</h3> I checked the speed with a <code>length(x)</code> of 2000 and (1) took about 1.6 ms on my laptop and (2) took about 9 ms.

Find distribution of consecutive zeros

Tags:

r

I have a vector, say x which contains only the integer numbers 0,1 and 2. For example;

x <- c(0,1,0,2,0,0,1,0,0,1,0,0,0,1,0)

From this I would like to extract how many times zero occurs in each "pattern". In this simple example it occurs three times on it own, twice as 00 and exactly once as 000, so I would like to output something like:

0      3
00     2
000    1

My actual dataset is quite large (1000-2000 elements in the vector) and at least in theory the maximum number of consecutive zeros is length(x)

424

asked Apr 11 '18 10:04

Robert Long

1 Answers

1) rle Use rle and table like this. No packages are needed.

tab <- with(rle(x), table(lengths[values == 0]))

giving:

> tab
1 2 3 
3 2 1

> as.data.frame(tab)
  Var1 Freq
1    1    3
2    2    2
3    3    1

That is, there are 3 runs of one zero, 2 runs of two zeros and 1 run of three zeros.

The output format in the question is not really feasible if there are very long runs but just for fun here it is:

data.frame(Sequence = strrep(0, names(tab)), Freq = as.numeric(tab))

giving:

  Sequence Freq
1        0    3
2       00    2
3      000    1

2) gregexpr Another possibility is to use a regular expression:

tab2 <- table(attr(gregexpr("0+", paste(x, collapse = ""))[[1]], "match.length"))

giving:

> tab2
1 2 3 
3 2 1

Other output formats could be derived as in (1).

Note

I checked the speed with a length(x) of 2000 and (1) took about 1.6 ms on my laptop and (2) took about 9 ms.

answered Oct 18 '22 08:10

G. Grothendieck

Related questions
                            
                                ggplot2 offset scatterplot points
                            
                                What algorithm I need to find n-grams?
                            
                                Conditional coloring of cells in table
                            
                                Error ".onLoad failed in loadNamespace() for 'tcltk'"
                            
                                Iterating over characters of string R
                            
                                Trying to publish an R notebook and keep getting the same error (Error in contrib.url(repos, "source") trying to use CRAN without setting a mirror
                            
                                Efficiently change elements in data based on neighbouring elements
                            
                                How can I add annotations below the x axis in ggplot2?
                            
                                How to get ranks with no gaps when there are ties among values?
                            
                                How can I read the source code for an R function?
                            
                                creating a triangular matrix
                            
                                Writing the data frame to MySql DB table
                            
                                Random forests in R (empty classes in y and argument legth 0)
                            
                                How to remove specific special characters in R
                            
                                Cannot compile a simple JNI program on Debian Wheezhy
                            
                                Using dplyr with filter, group_by & tail?
                            
                                use dplyr to concatenate a column [duplicate]
                            
                                Unpacking and merging lists in a column in data.frame
                            
                                Adding multiple conditions in conditionalPanel in Shiny
                            
                                Shiny server session time out doesn't work

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find distribution of consecutive zeros

Tags:

r

Robert Long

People also ask

1 Answers

Note

G. Grothendieck

Recent Activity

Donate For Us