Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write "good" Julia code when dealing with multiple types and arrays (multiple dispatch)

OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.

I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.

Consider the case where I have a function that provides the square of a Float64. I might write this as:

function mysquare(x::Float64)     return(x^2); end 

Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:

function mysquare(x::Array{Float64, 1})     y = Array(Float64, length(x));     for k = 1:length(x)         y[k] = x[k]^2;     end     return(y); end 

But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:

function mysquare(x::Int64)     return(x^2); end function mysquare(x::Array{Int64, 1})     y = Array(Float64, length(x));     for k = 1:length(x)         y[k] = x[k]^2;     end     return(y); end 

Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?

function mysquare{T<:Number}(x::T)     return(x^2); end function mysquare{T<:Number}(x::Array{T, 1})     y = Array(Float64, length(x));     for k = 1:length(x)         y[k] = x[k]^2;     end     return(y); end 

This feels sensible, but will my code run as quickly as the case where I avoid parametric types?

In summary, there are two parts to my question:

  1. If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?

  2. When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?

Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.

like image 662
Colin T Bowers Avatar asked Jul 29 '14 06:07

Colin T Bowers


People also ask

How do you time a function in Julia?

The standard way of timing things in Julia, is by use of the @time macro. Do note, that the code we want to time is put in a function . This is because everything we do at the top level in the REPL is in global scope.


1 Answers

Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.

As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro @vectorize_1arg to automatically generate the array version, e.g.:

function mysquare{T<:Number}(x::T)     return(x^2) end @vectorize_1arg Number mysquare println(mysquare([1,2,3])) 

As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.

As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to

function mysquare{T<:Number}(x::Array{T,1})     n = length(x)     y = Array(T, n)     for k = 1:n         @inbounds y[k] = x[k]^2     end     return y  end 

where I've added the @inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be

function mysquare{T<:Number}(x::Array{T,1})     n = length(x)     y = Array(typeof(one(T)^2), n)     for k = 1:n         @inbounds y[k] = x[k]^2     end     return y  end 

where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.

like image 142
IainDunning Avatar answered Sep 20 '22 22:09

IainDunning