Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining different results from sum() and '+'

Tags:

r

sum

Below is my experiment:

> xx = 293.62882204364098
> yy = 0.086783439604999998
> print(xx + yy, 20)
[1] 293.71560548324595175
> print(sum(c(xx,yy)), 20)
[1] 293.71560548324600859

It is strange to me that sum() and + giving different results when both are applied to the same numbers.

Is this result expected?

How can I get the same result?

Which one is most efficient?

like image 427
Bogaso Avatar asked Mar 01 '23 10:03

Bogaso


2 Answers

There is an r-devel thread here that includes some detailed description of the implementation. In particular, from Tomas Kalibera:

R uses long double type for the accumulator (on platforms where it is available). This is also mentioned in ?sum: "Where possible extended-precision accumulators are used, typically well supported with C99 and newer, but possibly platform-dependent."

This would imply that sum() is more accurate, although this comes with a giant flashing warning sign that if this level of accuracy is important to you, you should be very worried about the implementation of your calculations [in terms both of algorithms and underlying numerical implementations].

I answered a question here where I eventually figured out (after some false starts) that the difference between + and sum() is due to the use of extended precision for sum().

This code shows that the sums of individual elements (as in sum(xx,yy) are added together with + (in C), whereas this code is used to sum the individual components; line 154 (LDOUBLE s=0.0) shows that the accumulator is stored in extended precision (if available).

I believe that @JonSpring's timing results are probably explained (but would be happy to be corrected) by (1) sum(xx,yy) will have more processing, type-checking etc. than +; (2) sum(c(xx,yy)) will be slightly slower than sum(xx,yy) because it works in extended precision.

like image 185
Ben Bolker Avatar answered Mar 03 '23 23:03

Ben Bolker


Looks like addition is 3x as fast as summing, but unless you're doing high-frequency trading I can't see a situation where this would be your timing bottleneck.

xx = 293.62882204364098
yy = 0.086783439604999998

microbenchmark::microbenchmark(xx + yy, sum(xx,yy), sum(c(xx, yy)))
Unit: nanoseconds
           expr min    lq   mean median    uq  max neval
        xx + yy  88 102.5 111.90  107.0 110.0  352   100
    sum(xx, yy) 201 211.0 256.57  218.5 232.5 2886   100
 sum(c(xx, yy)) 283 297.5 330.42  304.0 311.5 1944   100
like image 24
Jon Spring Avatar answered Mar 04 '23 00:03

Jon Spring