As the title states: what exactly is the difference between @parallel
and pmap
? I don't mean the obvious one's a macro for a loop and the other works on functions, I mean how exactly does their implementation differ and how should I use this knowledge to choose between them?
The reason I ask is that a lot of the applications I write could use either construct: I could write a loop and calculate something with @parallel
, or wrap what would have been in the loop into a function and call pmap
on that. I have been following the advice of using @parallel
for things which are quick to evaluate and pmap
for calls where each task takes much longer (as it states in the documentation), but I feel that if I have a better understanding of what it's doing I'd be able to make better choices.
For example: does @parallel
divide up the work before evaluating? I noticed that if I run a parallel loop where each inner call takes a random amount of time, @parallel
can take a long time because at the end I have very few processes still working. pmap
on the same microtests doesn't seem to have this: is pmap
re-distributing the work as needed?
Other questions like this all stem from my ignorance of what exactly how pmap
differs from @parallel
.
@parallel
will take the jobs to be done and divy them up amongst available workers right away. Note in the ?@parallel
we get The specified range is partitioned ... across all workers.
pmap
by contrast, will start each worker on a job. Once a worker finishes with a job, it will give it the next available job. It is similar to queue based multiprocessing as is common in python, for instance. Thus, it's not so much a case of "redistributing" work but rather of only giving it out at the right time and to the right worker in the first place.
I cooked up the following example which I believe illustrates this. In this somewhat silly example, we have two workers, one of which is slow and the other of which is twice as fast. Ideally, we would want to give the fast worker twice as much work as the slow worker. (or, more realistically, we would have fast and slow jobs, but the principal is the exact same). pmap
will accomplish this, but @parallel
won't.
For each test, I initialize the following:
addprocs(2)
@everywhere begin
function parallel_func(idx)
workernum = myid() - 1
sleep(workernum)
println("job $idx")
end
end
Now, for the @parallel
test, I run the following:
@parallel for idx = 1:12
parallel_func(idx)
end
And get back print output:
julia> From worker 2: job 1
From worker 3: job 7
From worker 2: job 2
From worker 2: job 3
From worker 3: job 8
From worker 2: job 4
From worker 2: job 5
From worker 3: job 9
From worker 2: job 6
From worker 3: job 10
From worker 3: job 11
From worker 3: job 12
It's almost sweet. The workers have "shared" the work evenly. Note that each worker has completed 6 jobs, even though worker 2 is twice as fast as worker 3. It may be touching, but it is inefficient.
For for the pmap
test, I run the following:
pmap(parallel_func, 1:12)
and get the output:
From worker 2: job 1
From worker 3: job 2
From worker 2: job 3
From worker 2: job 5
From worker 3: job 4
From worker 2: job 6
From worker 2: job 8
From worker 3: job 7
From worker 2: job 9
From worker 2: job 11
From worker 3: job 10
From worker 2: job 12
Now, note that worker 2 has performed 8 jobs and worker 3 has performed 4. This is exactly in proportion to their speed, and what we want for optimal efficiency. pmap
is a hard task master - from each according to their ability.
Thus, the recommendations in the Julia docs make sense. If you have small simple jobs then it is more likely that these issues with@parallel
won't cause problems. For bigger or more complex jobs though, pmap
has advantages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With