I'm coming to julia from years of R and Matlab, where vectorised code was key for performance and has affected my thinking. Now, reading this wonderful blog post 'more dots' by S. Johnson I'm convinced that julia's syntactic loop fusion is a great feature of the language, but it leaves me a bit confused when it comes to porting my previously vectorised code.
Is it bad / poor form to have some functions vectorised over a few parameters? – Should I instead write "scalar" versions of all the routines and call the dot version (or broadcast / map) when I want to iterate over an array?
For the sake of argument, say I write the following
function sphere_vol(r)
return 4/3*pi*r^3
end
and then use it in another function,
function density(N, r)
V = sphere_vol(r)
return N/V
end
and so on (many functions calling each other). My instinct (coming from other languages) is to define most functions as vectorised whenever convenient, as in sphere_vol(r) = 4/3*pi*r.^3
(trivial change). Is there any point to do so, if I can automatically call sphere_vol.(r::Array)
(or map(sphere_vol, r::Array)
? For a single-argument function this is not an important decision to make, but for multiple arguments it can change dramatically the implementation: I could define a vectorised version of density(N, r)
vectorised over r
, or over N
, or both (returning a matrix), but it's much less trivial to write this way (needs to call broadcast internally over sphere_vol.()
and ./
). In R and Matlab I made this choice on a case-by-case basis with this compromise in mind:
vectorisation is (was) more convenient and more efficient: I can call density(N::Vector, r::Vector)
once and get a full array of values.
writing a vectorised function over several parameters quickly becomes cumbersome and/or unmanageable (2 parameters is usually OK with some tricks); especially when the return values are not scalars.
I do not know how to make the judgment call when implementing a new function in julia.
Would it be correct to say that in julia I'd be better off writing only "scalar" versions as above? Or is it the case that some routines will be more efficient if vectorised over some parameters (e.g special functions such as Bessel, which call Fortran routines)? I'm guessing there's a subtle balance to find (according to taste/style, also), but with much less impact on performance than with R or Matlab.
That is a good question. I can only provide my personal view on this.
The way I write code in Julia consists of two steps:
Write a compact vectorized (or matricial) version of a function:
After I have a working (and tested) implementation, I sometimes devectorize it:
Again, this is just my current take on it, maybe I change mind in the future? At the moment this works pretty well for me because I am at the prototyping stage of my projects.
Going further, we often say Julia addresses the so called two-language problem. You can start naive and build very complex API. Later on, if you didn't fail badly in terms of design, there is always a way to make things fly.
The Julia compiler can inline any function into generated code.
In your example, I would implement the scalar version sphere_vol(r::Real) = 4/3*pi*r^3
and go ahead and nest the function calls as described: density(N::Real, r::Real) = N / sphere_vol(r)
.
The compiler would probably pull sphere_vol
into its compiled implementation of density
.
The same thing goes for broadcast methods. If you need to compile the density of many spheres:
function f(x, y, z)
... stuff ...
vols .= sphere_vol.(rs)
... more stuff ...
end
the compiled version will likely have the scalar method baked into the broadcast.
Write the code that is easiest to inspect and maintain. Use type annotations. Julia will take care of the rest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With