Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is idiomatic Julia style for by column or row operations?

Tags:

r

julia

Apologies if this rather general - albeit still a coding question.

With a bit of time on my hands I've been trying to learn a bit of Julia. I thought a good start would be to copy the R microbenchmark function - so I could seamlessly compare R and Julia functions.

e.g. this is microbenchmark output for 2 R functions that I am trying to emulate:

Unit: seconds
expr                    min        lq    median        uq        max      neval
vectorised(x, y)    0.2058464 0.2165744 0.2610062 0.2612965  0.2805144     5
devectorised(x, y)  9.7923054 9.8095265 9.8097871 9.8606076 10.0144012     5

So thus far in Julia I am trying to write idiomatic and hopefully understandable/terse code. Therefore I replaced a double loop with a list comprehension to create an array of timings, like so:

function timer(fs::Vector{Function}, reps::Integer)
#    funs=length(fs)
#    times = Array(Float64, reps, funs)
#    for funsitr in 1:funs
#        for repsitr in 1:reps
#            times[reps, funs] = @elapsed fs[funs]()
#        end
#    end

    times= [@elapsed fs[funs]() for   x=1:reps, funs=1:length(fs)]
    return times
end

This gives an array of timings for each of 2 functions:

julia> test=timer([vec, devec], 10)
10x2 Array{Float64,2}:
 0.231621  0.173984
 0.237173  0.210059
 0.26722   0.174007
 0.265869  0.208332
 0.266447  0.174051
 0.266637  0.208457
 0.267824  0.174044
 0.26576   0.208687
 0.267089  0.174014
 0.266926  0.208741

My question (finally) is how do I idiomatically apply a function such as min, max, median across columns (or rows) of an array without using a loop?

I can of course do it easily for this simple case with a loop (sim to that I crossed out above)- but I can't find anything in the docs which is equivalent to say apply(array,1, fun) or even colMeans.

The closest generic sort of function I can think of is

julia> [mean(test[:,col]) for col=1:size(test)[2]]
2-element Array{Any,1}:
 0.231621
 0.237173

.. but the syntax really really doesn't appeal. Is there a more natural way to apply functions across columns or rows of a multidimensional array in Julia?

like image 798
Stephen Henderson Avatar asked Dec 24 '13 15:12

Stephen Henderson


3 Answers

The function you want is mapslices.

like image 89
John Myles White Avatar answered Nov 20 '22 15:11

John Myles White


Anonymous functions was are currently slow in julia, so I would not use them for benchmarking unless you benchmark anonymous functions. That will give wrong performance prediction for code that does not use anonymous functions in performance critical parts of the code.

I think you want the two argument version of the reduction functions, like sum(arr, 1) to sum over the first dimension. If there isn't a library function available, you might use reducedim

like image 29
ivarne Avatar answered Nov 20 '22 16:11

ivarne


I think @ivarne has the right answer (and have ticked it) but I just add that I made an apply like function:

function aaply(fun::Function, dim::Integer, ar::Array)
            if !(1 <= dim <= 2)
                    error("rows is 1, columns is 2")
            end
            if(dim==1)
                res= [fun(ar[row, :]) for row=1:size(ar)[dim]]
            end
            if(dim==2)
                res= [fun(ar[:,col]) for col=1:size(ar)[dim]]
            end
            return res
end

this then gets what I want like so:

julia> aaply(quantile, 2, test)
2-element Array{Any,1}:
 [0.231621,0.265787,0.266542,0.267048,0.267824]
 [0.173984,0.174021,0.191191,0.20863,0.210059] 

where quantile is a built-in that gives min, lq, median, uq, and max.. just like microbenchmark.

EDIT Following the advice here I tested the new function mapslice which works pretty much like R apply and benchmarked it against the function above. Note that mapslice has dim=1 as by column slice whilst test[:,1] is the first column... so the opposite of R though it has the same indexing?

# nonsense test data big columns
julia> ar=ones(Int64,1000000,4)
1000000x4 Array{Int64,2}:

# built in function
julia> ms()=mapslices(quantile,ar,1)
ms (generic function with 1 method)

# my apply function
julia> aa()=aaply(quantile, 2, ar)
aa (generic function with 1 method)

# compare both functions
julia> aaply(quantile, 2, timer1([ms, aa], 40))
2-element Array{Any,1}:
 [0.23566,0.236108,0.236348,0.236735,0.243008] 
 [0.235401,0.236058,0.236257,0.236686,0.238958]

So the funs are approximately as quick as each other. From reading bits of the Julia mailing list they seem to intend to do some work on this bit of Julialang so that making slices is by reference rather than making new copies of each slice (column row etc)...

like image 4
Stephen Henderson Avatar answered Nov 20 '22 16:11

Stephen Henderson