I have two monotonic increasing vectors, v1
and v2
of unequal lengths. For each value in v1
(e.g., v1[1], v1[2], ...
), I want to find the value in v2
that is just less than v1[i]
and compute the difference.
My current code (see below) works correctly, but does not seem to scale up well. So I am looking for recommendations to improve my approach with the requirement of staying in R, or using a package I can call from R.
Example code:
v1 <- c(3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0)
v2 <- c(0, 2, 3.2, 4.6, 5.5, 7.1, 9.9, 12, 13)
myFunc <- function(x,v2) x - max(v2[x>=v2])
v3 <- sapply(as.list(v1), FUN = myFunc, v2)
cbind(v1,v3)
v1 v3
[1,] 3.0 1.0
[2,] 3.5 0.3 # 0.3 = 3.5 - 3.2 where 3.5 is from v1[2] and 3.2 is v2[3]
[3,] 4.0 0.8
[4,] 4.5 1.3
[5,] 5.0 0.4
[6,] 5.5 0.0
[7,] 6.0 0.5
[8,] 6.5 1.0
[9,] 7.0 1.5
[10,] 7.5 0.4
[11,] 8.0 0.9
[12,] 8.5 1.4
[13,] 9.0 1.9
[14,] 9.5 2.4
[15,] 10.0 0.1
Benchmark 1: For small vectors, say roughly 10,000 elements, the code will run in <1 second:
> v1 <- seq(3,5000,.5)
> v2 <- seq(2.2,5200,.52)
>
> {
+ start <- Sys.time()
+ v3 <- sapply(as.list(v1), FUN = myFunc, v2)
+ Sys.time() - start
+ }
Time difference of 0.8118291 secs
Benchmark 2: For vectors with roughly 100,000 elements the code takes ~60-80 seconds.
> v1 <- seq(3,50000,.5)
> v2 <- seq(2.2,52000,.52)
>
> {
+ start <- Sys.time()
+ v3 <- sapply(as.list(v1), FUN = myFunc, v2)
+ Sys.time() - start
+ }
Time difference of 1.098762 mins
So to reiterate, I am looking for recommendations to improve my approach with the requirement of staying in R, or using a package I can call from R.
Use findInterval
:
v1 - v2[findInterval(v1,v2)]
#[1] 1.0 0.3 0.8 1.3 0.4 0.0 0.5 1.0 1.5 0.4 0.9 1.4 1.9 2.4 0.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With