Parallel implementation slower than serial in Julia

Question

Why in the following Julia code the parallel implementation runs slower than the serial?

using Distributed

@everywhere function ext(i::Int64)
   callmop = `awk '{ sum += $1 } END { print sum }' infile_$(i)`
   run(callmop)
end

function fpar()
   @sync @distributed for i = 1:10
      ext(i)
   end
end

function fnopar()
   for i = 1:10
      ext(i)
   end
end

val, t_par, bytes, gctime, memallocs = @timed fpar()
val, t_nopar, bytes, gctime, memallocs = @timed fnopar()

println("Parallel: $(t_par) s. Serial: $(t_nopar) s")  
# Parallel: 0.448290379 s. Serial: 0.028704802 s

The files infile_$(i) contain a single column of real numbers. After some research I bumped into this post and this other post) that deal with similar problems. They seem a bit dated though, if one considers the speed at which Julia is been developed. Is there any way to improve this parallel section? Thank you very much in advance.

Przemyslaw Szufel · Accepted Answer

Your code is correct but you measure the performance incorrectly.

Note that for this use case scenario (calling external processes) you should be fine with green threads - no need to distribute the load at all!

When a Julia function is executed for the first time it is being compiled. When you execute it on several parallel process all of them need to compile the same piece of code.

On top of that the first @distribution macro run also takes a long time to compile. Hence before using @timed you should call once both the fpar and nofpar functions.

Last but not least, there is no addprocs in your code but I assume that you have used -p Julia option to add the worker processes to your Julia master process. By the way you did not mention how many of the worker processes you have.

I usually test code like this:

@time fpar()
@time fpar()
@time fnopar()
@time fnopar()

The first measure is to understand the compile time and the second measure to understand the running time.

It is also worth having a look at the BenchmarkTools package and the @btime macro.

Regarding performance tests @distributed has a significant communication overhead. In some scenarios this can be mitigated by using SharedArrays in others by using Thread.@threads. However in your case the fastest code would be the one using green threads:

function ffast()
   @sync for i = 1:10
      @async ext(i)
   end
end

Parallel implementation slower than serial in Julia

Tags:

performance

parallel-processing

julia

external-process

panadestein

1 Answers

Przemyslaw Szufel

Recent Activity

Donate For Us

Parallel implementation slower than serial in Julia

Tags:

performance

parallel-processing

julia

external-process

panadestein

1 Answers

Przemyslaw Szufel

Related questions

Recent Activity

Donate For Us