Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parallel loops in Julia - don't want work divided up before starting

My machine has 4 cores. When I do parallel runs with @sync @parallel, I notice that Julia divides the jobs into 4 before sending the jobs to the 4 processors:

# start of do_something.jl
function do_something(i, parts)
    procs = zeros(Int, parts)
    procs[i] = myid()
    total = 0.0
    for j = 1:i * 100000000
        total = total + 1e-6
    end
    return procs
end
# end of do_something.jl

# synctest3a.jl
addprocs(Sys.CPU_CORES)
@everywhere include("do_something.jl")
parts = 20
procs = @sync @parallel (+) for i = 1:parts
    do_something(i, parts)
end
@printf("procs=%s\n", procs)

Result of julia synctest3a.jl, indicating the first 5 were sent to processor 2, the next 5 were sent to processor 3, and so on:

procs=[2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5]

I have an application where the time to execute do_something() can vary a lot (in this toy example it is more or less proportional to i). So what I really want is for each processor to execute do_something as soon it is free, rather than each one always doing exactly 1/4 of the calls. How do I do that?

like image 711
Peter B Avatar asked May 16 '18 02:05

Peter B


1 Answers

I think you should use pmap instead. It has a batch_size argument, which is 1 by default, meaning that parts will be sent to free workers one-by-one. With pmap, of course, you have to handle the reduction operation. Note that I have tried your function with pmap and observed the behavior you asked.

Another option to control scheduling behavior is defining your own pmap (the name does not matter, of course) function. In this way, you can have much more control on the scheduling. For example, you can change the scheduling based on the results from previous computations. See here for an example of pmap definition and how to define one.

like image 195
hckr Avatar answered Sep 28 '22 09:09

hckr