I am trying to create a function that will return the first integer of a subset of a vector such that the values of the subset are discrete, increasing by 1, and of a specified length.
For example, using the input data 'v' and a specified length 'l' of 3:
v <- c(3, 4, 5, 6, 15, 16, 25, 26, 27)
l <- 3
The possible sub-vectors of consecutive values of length 3 would be:
c(3, 4, 5)
c(4, 5, 6)
c(25, 26, 27)
Then I want to randomly choose one of these vectors and return the first/lowest number, i.e. 3, 4, or 25.
Here's an approach with base R
:
First, we create all possible sub-vectors of length length
. Next, we subset that list of vectors based on the cumsum
of their difference equalling 1
. The is.na
test ensures the last vectors which contain NA
are also filtered out. Then we just bind the remaining vectors into a matrix and sample the first column.
SampleSequencialVectors <- function(vec, length){
all.vecs <- lapply(seq_along(vec),function(x)vec[x:(x+(length-1))])
seq.vec <- all.vecs[sapply(all.vecs,function(x) all(diff(x) == 1 & !is.na(diff(x))))]
sample(do.call(rbind,seq.vec)[,1],1)
}
replicate(10, SampleSequencialVectors(v, 3))
# [1] 3 4 3 3 4 4 25 25 3 25
Or if you prefer a tidyverse type approach:
SampleSequencialVectorsPurrr <- function(vec, length){
vec %>%
seq_along %>%
purrr::map(~vec[.x:(.x+(length-1))]) %>%
purrr::keep(~ all(diff(.x) == 1 & !is.na(diff(.x)))) %>%
purrr::invoke(rbind,.) %>%
{sample(.[,1],size = 1)}
}
replicate(10, SampleSequencialVectorsPurrr(v, 3))
[1] 4 25 25 3 25 4 4 3 4 25
split(v, cumsum(c(1L, diff(v) != 1)))
runs[lengths(runs) >= lim]
x[1:(length(x) - lim + 1)]
).From all possible first values, sample 1.
runs = split(v, cumsum(c(1L, diff(v) != 1)))
first = lapply(runs[lengths(runs) >= lim], function(x) x[1:(length(x) - lim + 1)])
sample(unlist(first), 1)
Here we loop over runs of sufficient length, and not all individual values (see the other answers), thus it may be faster on larger vectors (haven't tested).
Slightly more compact using data.table
:
sample(data.table(v)[ , if(.N >= 3) v[1:(length(v) - lim + 1)],
by = .(cumsum(c(1L, diff(v) != 1)))]$V1, 1)
*Credits to the nice canonical: How to split a vector into groups of consecutive sequences?.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With