Given two separate vectors of equal length: f.start and f.end, I would like to construct a sequence (by 1), going from f.start[1]:f.end[1] to f.start[2]:f.end[2], ..., to f.start[n]:f.end[n].
Here is an example with just 6 rows.
f.start f.end
[1,] 45739 122538
[2,] 125469 202268
[3,] 203563 280362
[4,] 281657 358456
[5,] 359751 436550
[6,] 437845 514644
Crudely, a loop can do it, but is extremely slow for larger datasets (rows>2000).
f.start<-c(45739,125469,203563,281657,359751,437845)
f.end<-c(122538,202268,280362,358456,436550,514644)
f.ind<-f.start[1]:f.end[1]
for (i in 2:length(f.start))
{
f.ind.temp<-f.start[i]:f.end[i]
f.ind<-c(f.ind,f.ind.temp)
}
I suspect this can be done with apply(), but I have not worked out how to include two separate arguments in apply, and would appreciate some guidance.
You can try using mapply or Map, which iterates simultaneously on your two vectors. You need to provide the function as first argument:
vec1 = c(1,33,50)
vec2 = c(10,34,56)
unlist(Map(':',vec1, vec2))
# [1] 1 2 3 4 5 6 7 8 9 10 33 34 50 51 52 53 54 55 56
Just replace vec1 and vec2 by f.start and f.end provided all(f.start<=f.end)
Your loop is going to be slow as you are growing the vector
f.ind. You will also get an increase in speed if you pre-allocate
the length of the output vector.
# Some data (of length 3000)
set.seed(1)
f.start <- sample(1:10000, 3000)
f.end <- f.start + sample(1:200, 3000, TRUE)
# Functions
op <- function(L=1) {
f.ind <- vector("list", L)
for (i in 1:length(f.start)) {
f.ind[[i]] <- f.start[i]:f.end[i]
}
unlist(f.ind)
}
op2 <- function() unlist(lapply(seq(f.start), function(x) f.start[x]:f.end[x]))
col <- function() unlist(mapply(':',f.start, f.end))
# check output
all.equal(op(), op2())
all.equal(op(), col())
A few benchmarks
library(microbenchmark)
# Look at the effect of pre-allocating
microbenchmark(op(L=1), op(L=1000), op(L=3000), times=500)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# op(L = 1) 46.760416 48.741080 52.29038 49.636864 50.661506 113.08303 500 c
# op(L = 1000) 41.644123 43.965891 46.20380 44.633016 45.739895 94.88560 500 b
# op(L = 3000) 7.629882 8.098691 10.10698 8.338387 9.963558 60.74152 500 a
# Compare methods - the loop actually performs okay
# I left the original loop out
microbenchmark(op(L=3000), op2(), col(), times=500)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# op(L = 3000) 7.778643 8.123136 10.119464 8.367720 11.402463 62.35632 500 b
# op2() 6.461926 6.762977 8.619154 6.995233 10.028825 57.55236 500 a
# col() 6.656154 6.910272 8.735241 7.137500 9.935935 58.37279 500 a
So a loop should perform okay speed wise, but of course the Colonel's code is a lot cleaner. The *apply functions here wont really give much speed up in the calculation but they do offer tidier code and remove the need for pre-allocation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With