Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is vectorization faster

I've been learning R for a while now, and have come across a lot of advice to programming types like myself to vectorize operations. Being a programmer, I'm interested as to why / how it's faster. An example:

n = 10^7
# populate with random nos
v=runif(n)
system.time({vv<-v*v; m<-mean(vv)}); m
system.time({for(i in 1:length(v)) { vv[i]<-v[i]*v[i] }; m<-mean(vv)}); m

This gave

   user  system elapsed 
   0.04    0.01    0.07 
[1] 0.3332091

   user  system elapsed 
  36.68    0.02   36.69 
[1] 0.3332091

The most obvious thing to consider is that we're running native code, i.e. machine code compiled from C or C++, rather than interpreted code, as shown by the massive difference in user time between the two examples (circa 3 orders of magnitude). But is there anything else going on? For example, does R do:

  • Cunning native data structures, e.g. clever ways of storing sparse vectors or matrices so that we only do multiplications when we need to?

  • Lazy evaluation, e.g. on a matrix multiply, don't evaluate cells until as and when you need to.

  • Parallel processing.

  • Something else.

To test whether there might be some sparse vector optimization I tried doing dot products with difference vector contents

# populate with random nos
v<-runif(n)
system.time({m<-v%*%v/n}); m
# populate with runs of 1 followed by 99 0s
v <-rep(rep(c(1,rep(0,99)),n/100))
system.time({m<-v%*%v/n}); m
# populate with 0s
v <-rep(0,n)
system.time({m<-v%*%v/n}); m

However there was no significant difference in time (circa 0.09 elapsed)

(Similar question for Matlab: Why does vectorized code run faster than for loops in MATLAB?)

like image 220
TooTone Avatar asked Jun 03 '13 17:06

TooTone


3 Answers

The most obvious thing to consider is that we're running native code, i.e. machine code compiled from C or C++, rather than interpreted code

That's most of it. The other big-ish component is that since R code is functional in its design paradigm, functions (attempt to) have no side effects, which means that in some (but perhaps not all; R does try to be efficient about this) instances calling [<- in side a for loop results in having to copy the entire object. That can get slow.

A small side note: R does have rather extensive functionality for handling sparse matrix structures efficiently, but they aren't the "default".

like image 128
joran Avatar answered Oct 09 '22 19:10

joran


You are running both interpreted code and native code in both examples. The difference is that in the second you are doing the loop at the R level resulting in many more function calls that all need to be interpreted, and then the C code called. In your first example, the loop happens within the compiled code and hence R has far less to interpret, far fewer R code calls and far fewer calls to compiled code.

like image 43
Gavin Simpson Avatar answered Oct 09 '22 18:10

Gavin Simpson


In regard to parallel processing, out-of-the-box R does not do any parallel processing. Of course there is the built-in parallel package, but you have to adapt your code to using e.g. mclapply to use parallel processing. There are options to let your linear algebra be calculated in parallel using a special version of blas, but this is not standardly using in R, although getting it to work does not seems that hard.

like image 37
Paul Hiemstra Avatar answered Oct 09 '22 18:10

Paul Hiemstra