Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia Threads.@threads doesn't work in a simple example

Tags:

julia

I am running a double loop in Julia. The code is quite simple.

w = rand(1000,1000)
function regular_demo(w::Array{Float64, 2})
    n = size(w)[1]
    G = zeros(n,n)
    @inbounds for j in 1:n-1
    wjj = w[j, j]
        for i in j+1:n
            wii = w[i, i]
            wij = w[i, j]
            sum = wii + 2wij + 3wjj 
            sum_inverse = inv(sum)
            G[j,j] = G[j,j] + sum_inverse
            G[i,i] = G[i,i] + sum_inverse
            G[i,j] = G[i,j] - sum_inverse
            G[j,i] = G[j,i] - sum_inverse         
        end
    end
G
end
regular_demo(w)
function thread_demo(w::Array{Float64, 2})
    n = size(w)[1]
    G = zeros(n,n)
    @inbounds Threads.@threads for j in 1:n-1
    wjj = w[j, j]
        for i in j+1:n
            wii = w[i, i]
            wij = w[i, j]
            sum = wii + 2wij + 3wjj 
            sum_inverse = inv(sum)
            G[j,j] = G[j,j] + sum_inverse
            G[i,i] = G[i,i] + sum_inverse
            G[i,j] = G[i,j] - sum_inverse
            G[j,i] = G[j,i] - sum_inverse         
        end
    end
G
end
thread_demo(w)

The only difference between regular_demo(w) and thread_demo(w) is @inbounds Threads.@threads in line 4 of the function. If I run both regular_demo(w) and thread_demo(w) for the same matrix w. The output matrix G are different. How can I implement thread in thread_demo(w) for the purpose of getting the same output matrix G as regular_demo(w).

like image 512
Mizzle Avatar asked Jan 15 '20 00:01

Mizzle


1 Answers

The correct function should look like below

function thread_demo(w::Array{Float64, 2})
    n = size(w)[1]
    G = [Threads.Atomic{Float64}(0.0) for i in 1:n, j in 1:n]
    a =  n % 2 == 1 ? 1 : 0
    @inbounds Threads.@threads for j2 in 1:n-1
        j = j2 % 2 == 1 ? j2 : n - j2 + a
        wjj = w[j, j]
        for i in j+1:n
            sum_inverse = inv(w[i, i] + 2w[i, j] + 3wjj )
            for i in 1:1000
                sum_inverse = inv(sum_inverse)
            end #used for performance testing
            Threads.atomic_add!(G[j,j], sum_inverse)
            Threads.atomic_add!(G[j,i], -sum_inverse)
            Threads.atomic_add!(G[i,i], sum_inverse)
            Threads.atomic_add!(G[i,j], -sum_inverse)
        end
    end
    G
end

The following two things needed to be corrected in your code:

  • Threads.@threads slices the elements into equal parts, however the size of your inner loop decreases with larger j values. This calls for recalculation of j to have a more or less equal job distribution across threads
  • Your G identifiers overlaps for various values of j. This means you need to synchronize between threads for operations on the same data. You can use locks (e.g. Threads.SpinLock) or atomic values for this. Here I use the atomic values because they are more natural in this example.

You can check that my code works by running the following test:

gg1 = regular_demo(w);
gg2 = thread_demo(w);
@assert all(gg1 .≈ getproperty.(gg2, :value))

You need to use the approximate equal operator because parallelism can change the order of arithmetic operations and you get slightly different results.

Now the problem is that atomics are expensive. My code will take much longer to execute than your regular_demo function because coordination between threads is required and the operations inside your loop are very cheap. However, most likely your scenario is much more complicated than this MWE and with my code you will observe significant performance gain (for testing parallelism you could try to add something like for i in 1:1000; sum_inverse = inv(sum_inverse); end inside the inner loop and see yourself).

Note that In order for multithreading to work in Julia you need to set the JULIA_NUM_THREADS variable with the number of desired threads (not larger than the number of logical CPU cores).

Windows:

set JULIA_NUM_THREADS=4

Linux:

export JULIA_NUM_THREADS=4

Note that some IDEs set such as Juno by default set the number of threads equal to the number of CPU cores.

Last but not least, you can test your configuration by running Threads.nthreads():

julia> Threads.nthreads()
4
like image 50
Przemyslaw Szufel Avatar answered Oct 21 '22 02:10

Przemyslaw Szufel