Get runs of consecutive integers of certain length and sample from first values

Question

I am trying to create a function that will return the first integer of a subset of a vector such that the values of the subset are discrete, increasing by 1, and of a specified length.

For example, using the input data 'v' and a specified length 'l' of 3:

v <- c(3, 4, 5, 6, 15, 16, 25, 26, 27)
l <- 3

The possible sub-vectors of consecutive values of length 3 would be:

c(3, 4, 5)
c(4, 5, 6)
c(25, 26, 27)

Then I want to randomly choose one of these vectors and return the first/lowest number, i.e. 3, 4, or 25.

Ian Campbell · Accepted Answer

Here's an approach with base R:

First, we create all possible sub-vectors of length length. Next, we subset that list of vectors based on the cumsum of their difference equalling 1. The is.na test ensures the last vectors which contain NA are also filtered out. Then we just bind the remaining vectors into a matrix and sample the first column.

SampleSequencialVectors <- function(vec, length){
  all.vecs <- lapply(seq_along(vec),function(x)vec[x:(x+(length-1))])
  seq.vec <- all.vecs[sapply(all.vecs,function(x) all(diff(x) == 1 & !is.na(diff(x))))]
  sample(do.call(rbind,seq.vec)[,1],1)
}

replicate(10, SampleSequencialVectors(v, 3))
# [1]  3  4  3  3  4  4 25 25  3 25

Or if you prefer a tidyverse type approach:

SampleSequencialVectorsPurrr <- function(vec, length){
  vec %>%
    seq_along %>%
    purrr::map(~vec[.x:(.x+(length-1))]) %>%
    purrr::keep(~ all(diff(.x) == 1 & !is.na(diff(.x)))) %>%
    purrr::invoke(rbind,.) %>%
    {sample(.[,1],size = 1)}
}
replicate(10, SampleSequencialVectorsPurrr(v, 3))
 [1]  4 25 25  3 25  4  4  3  4 25

Henrik · Answer

Split the vector into runs of consecutive values*: split(v, cumsum(c(1L, diff(v) != 1)))
Select runs of length above or equal to the limit: runs[lengths(runs) >= lim]
From each run, select the possible first values (x[1:(length(x) - lim + 1)]).

From all possible first values, sample 1.

runs = split(v, cumsum(c(1L, diff(v) != 1)))

first = lapply(runs[lengths(runs) >= lim], function(x) x[1:(length(x) - lim + 1)])

sample(unlist(first), 1)

Here we loop over runs of sufficient length, and not all individual values (see the other answers), thus it may be faster on larger vectors (haven't tested).

Slightly more compact using data.table:

 sample(data.table(v)[ , if(.N >= 3) v[1:(length(v) - lim + 1)],
                       by = .(cumsum(c(1L, diff(v) != 1)))]$V1, 1)

*Credits to the nice canonical: How to split a vector into groups of consecutive sequences?.

Get runs of consecutive integers of certain length and sample from first values

Tags:

r

vector

sequence

mallard

Video Answer

2 Answers

Ian Campbell

Henrik

Recent Activity

Donate For Us

Get runs of consecutive integers of certain length and sample from first values

Tags:

r

vector

sequence

mallard

Video Answer

2 Answers

Ian Campbell

Henrik

Related questions

Recent Activity

Donate For Us