What is the fastest way to perform multiple logical comparisons in R?
Consider for example the vector x
set.seed(14)
x = sample(LETTERS[1:4], size=10, replace=TRUE)
I want to test if each entry of x
is either a "A" or a "B" (and not anything else). The following works
x == "A" | x == "B"
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
The above code loops three times through the length of the whole vector. Is there a way in R to loop only once and test for each item whether it satisfies one or another condition?
Once you have multiple logical vectors, you can combine them together using Boolean algebra. In R, & is “and”, | is “or”, and ! is “not”, and xor() is exclusive or2.
'&' and '&&' indicate logical AND and '|' and '||' indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.
& and && indicate logical AND and | and || indicate logical OR. The shorter forms performs elementwise comparisons in much the same way as arithmetic operators. The longer forms evaluates left to right, proceeding only until the result is determined.
R Boolean With Comparison Operators For example, to check if two numbers are equal, you can use the == operator. Similarly, to check if x is less than y , you can use the < operator. Since, the value stored in x is less than the value stored in y , the comparison x < y results in TRUE .
If your objective is just to make a single pass, that is pretty straightforward to write in Rcpp, even if you don't have much experience with C++:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::LogicalVector single_pass(Rcpp::CharacterVector x, Rcpp::String a, Rcpp::String b) {
R_xlen_t i = 0, n = x.size();
Rcpp::LogicalVector result(n);
for ( ; i < n; i++) {
result[i] = (x[i] == a || x[i] == b);
}
return result;
}
For such a small object as the one used in your example, the slight overhead of .Call
(presumably) masks the speed of the Rcpp version,
r_fun <- function(X) X == "A" | X == "B"
##
cpp_fun <- function(X) single_pass(X, "A", "B")
##
all.equal(r_fun(x), cpp_fun(x))
#[1] TRUE
microbenchmark::microbenchmark(
r_fun(x), cpp_fun(x), times = 1000L)
#Unit: microseconds
#expr min lq mean median uq max neval
#r_fun(x) 1.499 1.584 1.974156 1.6795 1.8535 37.903 1000
#cpp_fun(x) 1.860 2.334 3.042671 2.7450 3.1140 51.870 1000
But for larger vectors (I'm assuming this is your real intention), it is considerably faster:
x2 <- sample(LETTERS, 10E5, replace = TRUE)
##
all.equal(r_fun(x2), cpp_fun(x2))
# [1] TRUE
microbenchmark::microbenchmark(
r_fun(x2), cpp_fun(x2), times = 200L)
#Unit: milliseconds
#expr min lq mean median uq max neval
#r_fun(x2) 78.044518 79.344465 83.741901 80.999538 86.368627 149.5106 200
#cpp_fun(x2) 7.104929 7.201296 7.797983 7.605039 8.184628 10.7250 200
Here's a quick attempt at generalizing the above, if you have any use for it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With