I was in the middle of converting some code that utilized mostly numeric data (i.e. doubles) to integers and did a quick benchmark to see how much efficiency I gained.
To my surprise it was slower... by about 20%. I thought I had done something wrong, but the original code was only a few basic arithmetical operations on moderately sized vectors, so I knew it wasn't that. Maybe my environment was messed up? I restarted fresh, and the same result... integers were less efficient.
This started a series of test and a dive into the rabbit hole. Here is my first test. We sum one million elements using base R's sum
. Note that with R version 3.5.0
the timings are quite a bit different and with v 3.5.1, the timings are about the same (still not what one would expect):
set.seed(123)
int1e6 <- sample(1:10, 1e6, TRUE)
dbl1e6 <- runif(1e6, 1, 10)
head(int1e6)
# [1] 5 3 6 8 6 2
class(int1e6)
# [1] "integer"
head(dbl1e6)
# [1] 5.060628 2.291397 2.992889 5.299649 5.217105 9.769613
class(dbl1e6)
#[1] "numeric"
mean(dbl1e6)
# [1] 5.502034
mean(int1e6)
# [1] 5.505185
## R 3.5.0
library(microbenchmark)
microbenchmark(intSum = sum(int1e6), dblSum = sum(dbl1e6), times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
intSum 1033.677 1043.991 1147.9711 1111.438 1200.725 2723.834 1000
dblSum 817.719 835.486 945.6553 890.529 998.946 2736.024 1000
## R 3.5.1
Unit: microseconds
expr min lq mean median uq max neval
intSum 836.243 877.7655 966.4443 950.1525 997.9025 2077.257 1000
dblSum 866.939 904.7945 1015.3445 986.4770 1046.4120 2541.828 1000
class(sum(int1e6))
# [1] "integer"
class(sum(dbl1e6))
#[1] "numeric"
From here on out both version 3.5.0 and 3.5.1 give nearly identical results.
Here is our first dive into the rabbit hole. Along with the documentation for sum
(see ?sum
), we see that sum
is simply a generic function that is dispatched via standardGeneric
. Digging deeper, we see it eventually calls R_execMethod
here on line 516. This is where I get lost. It looks to me, like R_execClosure
is called next followed by many different possible branches. I think the standard path is to call eval
next, but I'm not sure. My guess is that eventually, a function is called in arithimetic.c but I can't find anything that specifically sums a vector of numbers. Either way, based off of my limited knowledge of method dispatching and C
in general, my naive assumption is that a function that looks like the following is called:
template <typename T>
T sum(vector<T> x) {
T mySum = 0;
for (std::size_t i = 0; i < x.size(); ++i)
mySum += x[i];
return mySum;
}
I know there is no function overloading or vectors in C
, but you get my point. My belief is that eventually, a bunch of the same type of elements are added to an element of the same type and eventually returned. In Rcpp
we would have something like:
template <typename typeReturn, typename typeRcpp>
typeReturn sumRcpp(typeRcpp x) {
typeReturn mySum = 0;
unsigned long int mySize = x.size();
for (std::size_t i = 0; i < mySize; ++i)
mySum += x[i];
return mySum;
}
// [[Rcpp::export]]
SEXP mySumTest(SEXP Rx) {
switch(TYPEOF(Rx)) {
case INTSXP: {
IntegerVector xInt = as<IntegerVector>(Rx);
int resInt = sumRcpp<int>(xInt);
return wrap(resInt);
}
case REALSXP: {
NumericVector xNum = as<NumericVector>(Rx);
double resDbl = sumRcpp<double>(xNum);
return wrap(resDbl);
}
default: {
Rcpp::stop("Only integers and numerics are supported");
}
}
}
And the benchmarks confirm my normal thinking about the inherit efficiency dominance of integers:
microbenchmark(mySumTest(int1e6), mySumTest(dbl1e6))
Unit: microseconds
expr min lq mean median uq max neval
mySumTest(int1e6) 103.455 160.776 185.2529 180.2505 200.3245 326.950 100
mySumTest(dbl1e6) 1160.501 1166.032 1278.1622 1233.1575 1347.1660 1644.494 100
This got me thinking further. Maybe it is just the complexity wrapped around standardGeneric
that makes the different data types behave strangely. So, let's skip all that jazz and go straight to the binary operators (+, -, *, /, %/%
)
set.seed(321)
int1e6Two <- sample(1:10, 1e6, TRUE)
dbl1e6Two <- runif(1e6, 1, 10)
## addition
microbenchmark(intPlus = int1e6 + int1e6Two,
dblPlus = dbl1e6 + dbl1e6Two, times = 1000)
Unit: milliseconds
expr min lq mean median uq max neval
intPlus 2.531220 3.214673 3.970903 3.401631 3.668878 82.11871 1000
dblPlus 1.299004 2.045720 3.074367 2.139489 2.275697 69.89538 1000
## subtraction
microbenchmark(intSub = int1e6 - int1e6Two,
dblSub = dbl1e6 - dbl1e6Two, times = 1000)
Unit: milliseconds
expr min lq mean median uq max neval
intSub 2.280881 2.985491 3.748759 3.166262 3.379755 79.03561 1000
dblSub 1.302704 2.107817 3.252457 2.208293 2.382188 70.24451 1000
## multiplication
microbenchmark(intMult = int1e6 * int1e6Two,
dblMult = dbl1e6 * dbl1e6Two, times = 1000)
Unit: milliseconds
expr min lq mean median uq max neval
intMult 2.913680 3.573557 4.380174 3.772987 4.077219 74.95485 1000
dblMult 1.303688 2.020221 3.078500 2.119648 2.299145 10.86589 1000
## division
microbenchmark(intDiv = int1e6 %/% int1e6Two,
dblDiv = dbl1e6 / dbl1e6Two, times = 1000)
Unit: milliseconds
expr min lq mean median uq max neval
intDiv 2.892297 3.210666 3.720360 3.228242 3.373456 62.12020 1000
dblDiv 1.228171 1.809902 2.558428 1.842272 1.990067 64.82425 1000
The classes are preserved as well:
unique(c(class(int1e6 + int1e6Two), class(int1e6 - int1e6Two),
class(int1e6 * int1e6Two), class(int1e6 %/% int1e6Two)))
# [1] "integer"
unique(c(class(dbl1e6 + dbl1e6Two), class(dbl1e6 - dbl1e6Two),
class(dbl1e6 * dbl1e6Two), class(dbl1e6 / dbl1e6Two)))
# [1] "numeric"
With every case, we see that arithmetic is 40% - 70% faster on numeric data type. What is really strange is that we get an even larger discrepancy when the two vectors being operated on are identical:
microbenchmark(intPlus = int1e6 + int1e6,
dblPlus = dbl1e6 + dbl1e6, times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
intPlus 2522.774 3148.464 3894.723 3304.189 3531.310 73354.97 1000
dblPlus 977.892 1703.865 2710.602 1767.801 1886.648 77738.47 1000
microbenchmark(intSub = int1e6 - int1e6,
dblSub = dbl1e6 - dbl1e6, times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
intSub 2236.225 2854.068 3467.062 2994.091 3214.953 11202.06 1000
dblSub 893.819 1658.032 2789.087 1730.981 1873.899 74034.62 1000
microbenchmark(intMult = int1e6 * int1e6,
dblMult = dbl1e6 * dbl1e6, times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
intMult 2852.285 3476.700 4222.726 3658.599 3926.264 78026.18 1000
dblMult 973.640 1679.887 2638.551 1754.488 1875.058 10866.52 1000
microbenchmark(intDiv = int1e6 %/% int1e6,
dblDiv = dbl1e6 / dbl1e6, times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
intDiv 2879.608 3355.015 4052.564 3531.762 3797.715 11781.39 1000
dblDiv 945.519 1627.203 2706.435 1701.512 1829.869 72215.51 1000
unique(c(class(int1e6 + int1e6), class(int1e6 - int1e6),
class(int1e6 * int1e6), class(int1e6 %/% int1e6)))
# [1] "integer"
unique(c(class(dbl1e6 + dbl1e6), class(dbl1e6 - dbl1e6),
class(dbl1e6 * dbl1e6), class(dbl1e6 / dbl1e6)))
# [1] "numeric"
That is nearly a 100% increase with every operator type!!!
How about a regular for loop in base R?
funInt <- function(v) {
mySumInt <- 0L
for (element in v)
mySumInt <- mySumInt + element
mySumInt
}
funDbl <- function(v) {
mySumDbl <- 0
for (element in v)
mySumDbl <- mySumDbl + element
mySumDbl
}
microbenchmark(funInt(int1e6), funDbl(dbl1e6))
Unit: milliseconds
expr min lq mean median uq max neval
funInt(int1e6) 25.44143 25.75075 26.81548 26.09486 27.60330 32.29436 100
funDbl(dbl1e6) 24.48309 24.82219 25.68922 25.13742 26.49816 29.36190 100
class(funInt(int1e6))
# [1] "integer"
class(funDbl(dbl1e6))
# [1] "numeric"
The difference isn't amazing, but still one would expect the integer sum to outperform the double sum. I really don't know what to think about this.
So my question is:
Why exactly do numeric data types outperform integer data types on basic arithmetical operations in base R?
Edit. Forgot to mention this:
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
F.Privé's "random guess" in the comments is really good! The function
do_arith
seems to be the starting point within arithmetic.c
. First for scalars we see that the case of REALSXP
is simple: e.g., standard +
is used. For INTSXP
there is a dispatch to, for example, R_integer_plus
, which does indeed check for integer overflow:
static R_INLINE int R_integer_plus(int x, int y, Rboolean *pnaflag)
{
if (x == NA_INTEGER || y == NA_INTEGER)
return NA_INTEGER;
if (((y > 0) && (x > (R_INT_MAX - y))) ||
((y < 0) && (x < (R_INT_MIN - y)))) {
if (pnaflag != NULL)
*pnaflag = TRUE;
return NA_INTEGER;
}
return x + y;
}
Similar for other binary operations. For vectors it is also similar. Within integer_binary
there is a dispatch to the same method, while in real_binary
the standard operations are used without any checks.
We can see this in action using the following Rcpp code:
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
#include <cstdint>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sumInt(IntegerVector a, IntegerVector b) {
IntegerVector result(no_init(a.size()));
std::transform(a.begin(), a.end(), b.begin(), result.begin(),
[] (int32_t x, int32_t y) {return x + y;});
return result;
}
// [[Rcpp::export]]
IntegerVector sumIntOverflow(IntegerVector a, IntegerVector b) {
IntegerVector result(no_init(a.size()));
std::transform(a.begin(), a.end(), b.begin(), result.begin(),
[] (int32_t x, int32_t y) {
if (x == NA_INTEGER || y == NA_INTEGER)
return NA_INTEGER;
if (((y > 0) && (x > (INT32_MAX - y))) ||
((y < 0) && (x < (INT32_MIN - y))))
return NA_INTEGER;
return x + y;
});
return result;
}
// [[Rcpp::export]]
NumericVector sumReal(NumericVector a, NumericVector b) {
NumericVector result(no_init(a.size()));
std::transform(a.begin(), a.end(), b.begin(), result.begin(),
[] (double x, double y) {return x + y;});
return result;
}
/*** R
set.seed(123)
int1e6 <- sample(1:10, 1e6, TRUE)
int1e6two <- sample(1:10, 1e6, TRUE)
dbl1e6 <- runif(1e6, 1, 10)
dbl1e6two <- runif(1e6, 1, 10)
microbenchmark::microbenchmark(int1e6 + int1e6two,
sumInt(int1e6, int1e6two),
sumIntOverflow(int1e6, int1e6two),
dbl1e6 + dbl1e6two,
sumReal(dbl1e6, dbl1e6two),
times = 1000)
*/
Result:
Unit: microseconds
expr min lq mean median uq max neval
int1e6 + int1e6two 1999.698 2046.2025 2232.785 2061.7625 2126.970 5461.816 1000
sumInt 812.560 846.1215 1128.826 861.9305 892.089 44723.313 1000
sumIntOverflow 1664.351 1690.2455 1901.472 1702.6100 1760.218 4868.182 1000
dbl1e6 + dbl1e6two 1444.172 1501.9100 1997.924 1526.0695 1641.103 47277.955 1000
sumReal 1459.224 1505.2715 1887.869 1530.5995 1675.594 5124.468 1000
Introducing the overflow checking into the C++ code produces a significant reduction in performance. Even though it is not as bad as the standard +
. So if you know that your integer numbers are "well behaved", you can gain quite a bit of performance by skipping R's error checking by going straight to C/C++. This reminds me of another question with a similar conclusion. The error checking done by R can be costly.
For the case with identical vectors, I get the following benchmark results:
Unit: microseconds
expr min lq mean median uq max neval
int1e6 + int1e6 1761.285 2000.720 2191.541 2011.5710 2029.528 47397.029 1000
sumInt 648.151 761.787 1002.662 767.9885 780.129 46673.632 1000
sumIntOverflow 1408.109 1647.926 1835.325 1655.6705 1670.495 44958.840 1000
dbl1e6 + dbl1e6 1081.079 1119.923 1443.582 1137.8360 1173.807 44469.509 1000
sumReal 1076.791 1118.538 1456.917 1137.2025 1250.850 5141.558 1000
There is a significant performance increase for doubles (both R and C++). For integers there is also some performance increase, but not as seizable as for doubles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With