Suppose I have a vector, vec
, which is long (starting at 1E8 entries) and would like to bound it to the range [a,b]
. I can certainly code vec[vec < a] = a
and vec[vec > b] = b
, but this requires two passes over the data and a large RAM allocation for the temporary indicator vector (~800MB, twice). The two passes burn time because we can do better if we copy data from main memory to the local cache just once (calls to main memory are bad, as are cache misses). And who knows how much this could be improved with multiple threads, but let's not get greedy. :)
Is there a nice implementation in base R or some package that I'm overlooking, or is this a job for Rcpp (or my old friend data.table
)?
In my benchmarking project, Base R sorts a dataset much faster than dplyr or data.
Beyond performance limitations due to design and implementation, it has to be said that a lot of R code is slow simply because it's poorly written. Few R users have any formal training in programming or software development. Fewer still write R code for a living.
A naive C solution is
library(inline)
fun4 <-
cfunction(c(x="numeric", a="numeric", b="numeric"), body4,
language="C")
body4 <- "
R_len_t len = Rf_length(x);
SEXP result = Rf_allocVector(REALSXP, len);
const double aa = REAL(a)[0], bb = REAL(b)[0], *xp = REAL(x);
double *rp = REAL(result);
for (int i = 0; i < len; ++i)
if (xp[i] < aa)
rp[i] = aa;
else if (xp[i] > bb)
rp[i] = bb;
else
rp[i] = xp[i];
return result;
"
fun4 <-
cfunction(c(x="numeric", a="numeric", b="numeric"), body4,
language="C")
With a simple parallel version (as Dirk points out, this is with CFLAGS = -fopenmp
in ~/.R/Makevars, and on a platform / compiler supporting openmp)
body5 <- "
R_len_t len = Rf_length(x);
const double aa = REAL(a)[0], bb = REAL(b)[0], *xp = REAL(x);
SEXP result = Rf_allocVector(REALSXP, len);
double *rp = REAL(result);
#pragma omp parallel for
for (int i = 0; i < len; ++i)
if (xp[i] < aa)
rp[i] = aa;
else if (xp[i] > bb)
rp[i] = bb;
else
rp[i] = xp[i];
return result;
"
fun5 <-
cfunction(c(x="numeric", a="numeric", b="numeric"), body5,
language="C")
And benchmarks
> z <- runif(1e7)
> benchmark(fun1(z,0.25,0.75), fun4(z, .25, .75), fun5(z, .25, .75),
+ replications=10)
test replications elapsed relative user.self sys.self
1 fun1(z, 0.25, 0.75) 10 9.087 14.609325 8.335 0.739
2 fun4(z, 0.25, 0.75) 10 1.505 2.419614 1.305 0.198
3 fun5(z, 0.25, 0.75) 10 0.622 1.000000 2.156 0.320
user.child sys.child
1 0 0
2 0 0
3 0 0
> identical(res1 <- fun1(z,0.25,0.75), fun4(z,0.25,0.75))
[1] TRUE
> identical(res1, fun5(z, 0.25, 0.75))
[1] TRUE
on my quad-core laptop. Assumes numeric input, no error checking, NA handling, etc.
Just to start things off: not much difference between your solution and the pmin
/pmax
solution (trying things out with n=1e7 rather than n=1e8 because I'm impatient) -- pmin
/pmax
is actually marginally slower.
fun1 <- function(x,a,b) {x[x<a] <- a; x[x>b] <- b; x}
fun2 <- function(x,a,b) pmin(pmax(x,a),b)
library(rbenchmark)
z <- runif(1e7)
benchmark(fun1(z,0.25,0.75),fun2(z,0.25,0.75),rep=50)
test replications elapsed relative user.self sys.self
1 fun1(z, 0.25, 0.75) 10 21.607 1.00000 6.556 15.001
2 fun2(z, 0.25, 0.75) 10 23.336 1.08002 5.656 17.605
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With