I'm subsetting a dataset before plotting, but the key being numeric I cannot use the strict equality testing of match()
or %in%
(it misses a few values).
I wrote the following alternative, but I imagine this problem is sufficiently common that there's a better built-in alternative somewhere? all.equal
doesn't seem to be designed for multiple test values.
select_in <- function(x, ref, tol=1e-10){
testone <- function(value) abs(x - value) < tol
as.logical(rowSums(sapply(ref, testone)) )
}
x = c(1.0, 1+1e-13, 1.01, 2, 2+1e-9, 2-1e-11)
x %in% c(1,2,3)
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
select_in(x, c(1, 2, 3))
#[1] TRUE TRUE FALSE TRUE FALSE TRUE
This seems to achieve the goal (albeit not quite with a tolerance):
fselect_in <- function(x, ref, d = 10){
round(x, digits=d) %in% round(ref, digits=d)
}
fselect_in(x, c(1,2,3))
# TRUE TRUE FALSE TRUE FALSE TRUE
Not sure how much better it is but all.equal
has a tolerance argument that will work:
`%~%` <- function(x,y) sapply(x, function(.x) {
any(sapply(y, function(.y) isTRUE(all.equal(.x, .y, tolerance=tol))))
})
x %~% c(1,2,3)
[1] TRUE TRUE FALSE TRUE FALSE TRUE
I don't like having two apply functions there. I'll try to shorten it.
update
Another way that might be faster without using all.equal
. It turns out to be much faster than the first solution:
`%~%` <- function(x,y) {
out <- logical(length(x))
for(i in 1:length(x)) out[i] <- any(abs(x[i] - y) <= tol)
out
}
x %~% c(1,2,3)
[1] TRUE TRUE FALSE TRUE FALSE TRUE
Benchmark
big.x <- rep(x, 1e3)
big.y <- rep(y, 100)
all.equal(select_in(big.x, big.y), big.x %~% big.y)
[1] TRUE
library(microbenchmark)
microbenchmark(
baptiste = select_in(big.x, big.y),
plafort2 = big.x %~% big.y,
times=50L)
Unit: milliseconds
expr min lq mean median uq max
baptiste 185.86828 199.57517 231.28246 244.81980 261.7451 271.3426
plafort2 49.03265 54.30729 84.88076 66.10971 118.3270 123.1074
neval cld
50 b
50 a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With