Why is mean() so slow?

Tags:

Everything is in the question! I just tried to do a bit of optimization, and nailing down the bottle necks, out of curiosity, I tried that:

t1 <- rnorm(10) microbenchmark(   mean(t1),   sum(t1)/length(t1),   times = 10000)

and the result is that mean() is 6+ times slower than the computation "by hand"!

Does it stem from the overhead in the code of mean() before the call to the Internal(mean) or is it the C code itself which is slower? Why? Is there a good reason and thus a good use case?

924

asked Sep 04 '13 02:09

2 Answers

It is due to the s3 look up for the method, and then the necessary parsing of arguments in mean.default. (and also the other code in mean)

sum and length are both Primitive functions. so will be fast (but how are you handling NA values?)

t1 <- rnorm(10) microbenchmark(   mean(t1),   sum(t1)/length(t1),   mean.default(t1),   .Internal(mean(t1)),   times = 10000)  Unit: nanoseconds                 expr   min    lq median    uq     max neval             mean(t1) 10266 10951  11293 11635 1470714 10000   sum(t1)/length(t1)   684  1027   1369  1711  104367 10000     mean.default(t1)  2053  2396   2738  2739 1167195 10000  .Internal(mean(t1))   342   343    685   685   86574 10000

The internal bit of mean is faster even than sum/length.

See http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table#method_dispatch_takes_time (mirror) for more details (and a data.table solution that avoids .Internal).

Note that if we increase the length of the vector, then the primitive approach is fastest

t1 <- rnorm(1e7) microbenchmark(      mean(t1),      sum(t1)/length(t1),      mean.default(t1),      .Internal(mean(t1)), +     times = 100)  Unit: milliseconds                 expr      min       lq   median       uq      max neval             mean(t1) 25.79873 26.39242 26.56608 26.85523 33.36137   100   sum(t1)/length(t1) 15.02399 15.22948 15.31383 15.43239 19.20824   100     mean.default(t1) 25.69402 26.21466 26.44683 26.84257 33.62896   100  .Internal(mean(t1)) 25.70497 26.16247 26.39396 26.63982 35.21054   100

Now method dispatch is only a fraction of the overall "time" required.

answered Sep 21 '22 19:09

mnel

mean is slower than computing "by hand" for several reasons:

S3 Method dispatch
NA handling
Error correction

Points 1 and 2 have already been covered. Point 3 is discussed in What algorithm is R using to calculate mean?. Basically, mean makes 2 passes over the vector in order to correct for floating point errors. sum only makes 1 pass over the vector.

Notice that identical(sum(t1)/length(t1), mean(t1)) may be FALSE, due to these precision issues.

> set.seed(21); t1 <- rnorm(1e7,,21) > identical(sum(t1)/length(t1), mean(t1)) [1] FALSE > sum(t1)/length(t1) - mean(t1) [1] 2.539201e-16

answered Sep 21 '22 19:09

Joshua Ulrich

Related questions
                            
                                inconsolata missing to build R vignette
                            
                                dplyr join warning: joining factors with different levels
                            
                                How can I add a subtitle and change the font size of ggplot plots in R?
                            
                                How to control the igraph plot layout with Fixed Positions?
                            
                                Memory efficient alternative to rbind - in-place rbind?
                            
                                Subset data.frame by date
                            
                                Replace NA in column with value in adjacent column
                            
                                Placing Custom Images in a Plot Window--as custom data markers or to annotate those markers
                            
                                rpy2 install on windows 7
                            
                                R shiny: Add weblink to actionButton
                            
                                Select groups which have at least one of a certain value
                            
                                Equivalent of matlab 'ans' in R [duplicate]
                            
                                How to perform natural (lexicographic) sorting in R? [duplicate]
                            
                                Fitting data to distributions?
                            
                                Keep value if not in case_when statement
                            
                                Is it possible to have sortable (Interactive) table in rMarkdown?
                            
                                Linear model function lm() error: NA/NaN/Inf in foreign function call (arg 1)
                            
                                Working with neuralnet in R for the first time: get "requires numeric/complex matrix/vector arguments"
                            
                                Convert a matrix in R into a upper triangular/lower triangular matrix with those corresponding entries
                            
                                Get Selected Row From DataTable in Shiny App

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is mean() so slow?

Tags:

performance

optimization

r

Antoine Lizée

People also ask

2 Answers

mnel

Joshua Ulrich

Recent Activity

Donate For Us