Why are some of the logical operators so slow?

Tags:

boolean-logic

While trying to optimize my code, I found some logical operations were slower than I expected when compared to similar operations on integer or numeric.

So I went onto rewriting basic boolean operators !, &, |, xor as follows:

my.not <- function(x) as.logical(1L - as.integer(x)) my.and <- function(e1, e2) as.logical(as.integer(e1) * as.integer(e2)) my.or  <- function(e1, e2) as.logical(as.integer(e1) + as.integer(e2)) my.xor <- function(e1, e2) as.logical(as.integer(e1) + as.integer(e2) == 1L)

Testing that everything works as expected:

a <- sample(c(TRUE, FALSE), 1e6, TRUE) b <- sample(c(TRUE, FALSE), 1e6, TRUE)  identical(!a, my.not(a))             # TRUE identical(a & b, my.and(a, b))       # TRUE identical(a | b, my.or(a, b))        # TRUE identical(xor(a, b), my.xor(a, b))   # TRUE

Now benchmarking:

library(microbenchmark) microbenchmark(!a, my.not(a),                a & b, my.and(a, b),                a | b, my.or(a, b),                xor(a, b), my.xor(a, b))  # Unit: milliseconds #          expr       min        lq    median         uq       max neval #            !a  1.237437  1.459042  1.463259   1.492671  17.28209   100 #     my.not(a)  6.018455  6.263176  6.414515  15.291194  70.16313   100 #         a & b 32.318530 32.667525 32.769014  32.973878  50.55528   100 #  my.and(a, b)  8.010022  8.592776  8.750786  18.145590  78.38736   100 #         a | b 32.030545 32.383769 32.506937  32.820720 102.43609   100 #   my.or(a, b) 12.089538 12.434793 12.663695  22.046841  32.19095   100 #     xor(a, b) 94.892791 95.480200 96.072202 106.104000 164.19937   100 #  my.xor(a, b) 13.337110 13.708025 14.048350  24.485478  29.75883   100

Looking at the results, the ! operator is the only one that seems to do a decent job versus my own. The other three are a few times slower. A bit embarrassing for Primitive functions. I even expect that well implemented boolean operators should be a lot faster than operations on integers (how I implemented my own functions.)

Question: Why? Bad implementation? Or maybe the primitive functions are doing some good things (e.g. error checking, special cases) that my functions are not?

262

asked Sep 22 '13 00:09

flodel

1 Answers

Looking at the C implementation a little, the logical and math operations implement their loops differently. The logical operations do something like (in logic.c:327)

library(inline)  or1 <- cfunction(c(x="logical", y="logical"), "     int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;     SEXP ans = PROTECT(allocVector(LGLSXP, n));     int x1, y1;     for (int i = 0; i < n; i++) {         x1 = LOGICAL(x)[i % nx];         y1 = LOGICAL(y)[i % ny];         if ((x1 != NA_LOGICAL && x1) || (y1 != NA_LOGICAL && y1))             LOGICAL(ans)[i] = 1;         else if (x1 == 0 && y1 == 0)             LOGICAL(ans)[i] = 0;         else             LOGICAL(ans)[i] = NA_LOGICAL;     }     UNPROTECT(1);     return ans; ")

where there are two modulo operators % each iteration. In contrast the arithmetic operations (in Itermacros.h:54) do something like

or2 <- cfunction(c(x="logical", y="logical"), "     int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;     SEXP ans = PROTECT(allocVector(LGLSXP, n));     int x1, y1, ix=0, iy=0;     for (int i = 0; i < n; i++) {         x1 = LOGICAL(x)[ix];         y1 = LOGICAL(x)[iy];         if (x1 == 0 || y1 == 0)             LOGICAL(ans)[i] = 0;         else if (x1 == NA_LOGICAL || y1 == NA_LOGICAL)             LOGICAL(ans)[i] = NA_LOGICAL;         else             LOGICAL(ans)[i] = 1;          if (++ix == nx) ix = 0;         if (++iy == ny) iy = 0;     }     UNPROTECT(1);     return ans; ")

performing two tests for identity. Here's a version that skips the test for NA

or3 <- cfunction(c(x="logical", y="logical"), "     int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;     SEXP ans = PROTECT(allocVector(LGLSXP, n));     int x1, y1, ix=0, iy=0;     for (int i = 0; i < n; ++i) {         x1 = LOGICAL(x)[ix];         y1 = LOGICAL(y)[iy];         LOGICAL(ans)[i] = (x1 || y1);         if (++ix == nx) ix = 0;         if (++iy == ny) iy = 0;     }     UNPROTECT(1);     return ans; ")

and then a version that avoids the LOGICAL macro

or4 <- cfunction(c(x="logical", y="logical"), "     int nx = LENGTH(x), ny = LENGTH(y), n = nx > ny ? nx : ny;     SEXP ans = PROTECT(allocVector(LGLSXP, n));     int *xp = LOGICAL(x), *yp = LOGICAL(y), *ansp = LOGICAL(ans);     for (int i = 0, ix = 0, iy = 0; i < n; ++i)     {         *ansp++ = xp[ix] || yp[iy];         ix = (++ix == nx) ? 0 : ix;         iy = (++iy == ny) ? 0 : iy;     }     UNPROTECT(1);     return ans; ")

Here are some timings

microbenchmark(my.or(a, b), a|b, or1(a, b), or2(a, b), or3(a, b), or4(a, b)) Unit: milliseconds         expr       min        lq    median       uq      max neval  my.or(a, b)  8.002435  8.100143 10.082254 11.56076 12.05393   100        a | b 23.194829 23.404483 23.860382 24.30020 24.96712   100    or1(a, b) 17.323696 17.659705 18.069139 18.42815 19.57483   100    or2(a, b) 13.040063 13.197042 13.692152 14.09390 14.59378   100    or3(a, b)  9.982705 10.037387 10.578464 10.96945 11.48969   100    or4(a, b)  5.544096  5.592754  6.106694  6.30091  6.94995   100

The difference between a|b and or1 reflects things not implemented here, like attributes and dimensions and special handling of objects. From or1 to or2 reflects the cost of different ways of recycling; I was surprised that there were differences here. From or2 to or3 is the cost of NA-safety. It's a little hard to know whether the additional speed-up in or4 would be seen in a base R implementation -- in user C code LOGICAL() is a macro, but in base R it's an inlined function call.

The code was compiled with -O2 flags and

> system("clang++ --version") Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) Target: x86_64-pc-linux-gnu Thread model: posix

Times of my.or were not particularly consistent between independent R sessions, sometimes taking quite a bit longer; I'm not sure why. The timings above were with R version 2.15.3 Patched (2013-03-13 r62579); current R-devel seemed about 10% faster.

125

answered Sep 28 '22 05:09

Martin Morgan

Related questions
                            
                                Efficient extraction of all sub-polygons generated by self-intersecting features in a MultiPolygon
                            
                                Why do some Unicode characters display in matrices, but not data frames in R?
                            
                                Which packages make good use of S4 objects?
                            
                                simple examples of filter function, recursive option specifically
                            
                                equivalent of R's View for Python's pandas
                            
                                How do you read a password protected excel file into r?
                            
                                How to convert a Shiny app consisting of multiple files into an easily shareable and reproducible Shiny example?
                            
                                locator equivalent in ggplot2 (for maps)
                            
                                Same random seed in Matlab and R
                            
                                How do I manipulate/access elements of an instance of "dist" class using core R?
                            
                                Create new column in dataframe based on partial string matching other column
                            
                                R shiny different output between renderTable and renderDataTable
                            
                                melt to two variable columns
                            
                                Use a variable within a plotmath expression
                            
                                Example Needed: Change the default print method of an object
                            
                                Relocating Alaska and Hawaii on thematic map of the USA with ggplot2
                            
                                How to convert a symmetric matrix into "dist" object?
                            
                                Knitr style table with CSS
                            
                                Can't run Rcpp function in foreach - "NULL value passed as symbol address"
                            
                                Aligning title, subtitle and caption for horizontal ggplot barchart

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With