Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

on the impulse to vectorise all the things

I'm coming to julia from years of R and Matlab, where vectorised code was key for performance and has affected my thinking. Now, reading this wonderful blog post 'more dots' by S. Johnson I'm convinced that julia's syntactic loop fusion is a great feature of the language, but it leaves me a bit confused when it comes to porting my previously vectorised code.

Is it bad / poor form to have some functions vectorised over a few parameters?Should I instead write "scalar" versions of all the routines and call the dot version (or broadcast / map) when I want to iterate over an array?

For the sake of argument, say I write the following

function sphere_vol(r)
    return 4/3*pi*r^3
end

and then use it in another function,

function density(N, r)
    V = sphere_vol(r)
    return N/V
end

and so on (many functions calling each other). My instinct (coming from other languages) is to define most functions as vectorised whenever convenient, as in sphere_vol(r) = 4/3*pi*r.^3 (trivial change). Is there any point to do so, if I can automatically call sphere_vol.(r::Array) (or map(sphere_vol, r::Array)? For a single-argument function this is not an important decision to make, but for multiple arguments it can change dramatically the implementation: I could define a vectorised version of density(N, r) vectorised over r, or over N, or both (returning a matrix), but it's much less trivial to write this way (needs to call broadcast internally over sphere_vol.() and ./). In R and Matlab I made this choice on a case-by-case basis with this compromise in mind:

  1. vectorisation is (was) more convenient and more efficient: I can call density(N::Vector, r::Vector) once and get a full array of values.

  2. writing a vectorised function over several parameters quickly becomes cumbersome and/or unmanageable (2 parameters is usually OK with some tricks); especially when the return values are not scalars.

I do not know how to make the judgment call when implementing a new function in julia.

Would it be correct to say that in julia I'd be better off writing only "scalar" versions as above? Or is it the case that some routines will be more efficient if vectorised over some parameters (e.g special functions such as Bessel, which call Fortran routines)? I'm guessing there's a subtle balance to find (according to taste/style, also), but with much less impact on performance than with R or Matlab.

like image 868
learnjulia Avatar asked Apr 11 '18 21:04

learnjulia


2 Answers

That is a good question. I can only provide my personal view on this.

The way I write code in Julia consists of two steps:

  1. Write a compact vectorized (or matricial) version of a function:

    • This is usually very quick to do because it matches mathematical derivations on paper
    • It is less error-prone, you will rarely introduce an indexing bug
    • It is also very convenient because, as you pointed out, some built-in functions expect arrays to just forward them to LAPACK, BLAS, FFTW, etc.
  2. After I have a working (and tested) implementation, I sometimes devectorize it:

    • For instance, if the function is being called multiple times in a loop, and the vectorized version allocates memory inappropriately, there is room to improve performance by not using an array at all, and just looping indices explicitly.

Again, this is just my current take on it, maybe I change mind in the future? At the moment this works pretty well for me because I am at the prototyping stage of my projects.

Going further, we often say Julia addresses the so called two-language problem. You can start naive and build very complex API. Later on, if you didn't fail badly in terms of design, there is always a way to make things fly.

like image 197
juliohm Avatar answered Nov 07 '22 16:11

juliohm


The Julia compiler can inline any function into generated code.

In your example, I would implement the scalar version sphere_vol(r::Real) = 4/3*pi*r^3 and go ahead and nest the function calls as described: density(N::Real, r::Real) = N / sphere_vol(r).

The compiler would probably pull sphere_vol into its compiled implementation of density.

The same thing goes for broadcast methods. If you need to compile the density of many spheres:

function f(x, y, z) ... stuff ... vols .= sphere_vol.(rs) ... more stuff ... end

the compiled version will likely have the scalar method baked into the broadcast.

Write the code that is easiest to inspect and maintain. Use type annotations. Julia will take care of the rest.

like image 20
TimW Avatar answered Nov 07 '22 17:11

TimW