I need to parallelise a certain task over a number of workers. To that purpose I need all workers to have access to a matrix that stores the data. I thought that the data matrix could be implemented as a Shared Array in order to minimise data movement. In order to get me started with Shared Arrays, I am trying the following very simple example which gives me, what I think is, unexpected behaviour: <pre class="prettyprint"><code>julia -p 2 # the data matrix D = SharedArray(Float64, 2, 3) # initialise the data matrix with dummy values for ii=1:length(D) D[ii] = rand() end # Define some kind of dummy computation involving the shared array f = x -> x + sum(D) # call function on worker @time fetch(@spawnat 2 f(1.0)) </code></pre> The last command gives me the following error: <pre class="prettyprint"><code> ERROR: On worker 2: UndefVarError: D not defined in anonymous at none:1 in anonymous at multi.jl:1358 in anonymous at multi.jl:904 in run_work_thunk at multi.jl:645 in run_work_thunk at multi.jl:654 in anonymous at task.jl:58 in remotecall_fetch at multi.jl:731 in call_on_owner at multi.jl:777 in fetch at multi.jl:795 </code></pre> I thought that the Shared Array D should be visible to all workers? I am clearly missing something basic. Thanks in advance.

Although the underlying data is shared to all workers, the declaration of <code>D</code> is not. You will still need to pass in the reference to D, so something like <code>f = (x,SA) -> x + sum(SA) @time fetch(@spawnat 2 f(1.0,D)) </code> should work. You can change D on the main process and see that it is infact using the same data: <pre class="prettyprint"><code>julia> # call function on worker @time fetch(@spawnat 2 f(1.0,D)) 0.325254 seconds (225.62 k allocations: 9.701 MB, 5.88% gc time) 4.405613684678047 julia> D[1] += 1 1.2005544517241717 julia> # call function on worker @time fetch(@spawnat 2 f(1.0,D)) 0.004548 seconds (637 allocations: 45.490 KB) 5.405613684678047 </code></pre>

This works, without declaring D, through a closure within a function. <pre class="prettyprint"><code>function dothis() D = SharedArray{Float64}(2, 3) # initialise the data matrix with dummy values for ii=1:length(D) D[ii] = ii #not rand() anymore end # Define some kind of dummy computation involving the shared array f = x -> x + sum(D) # call function on worker @time fetch(@spawnat 2 f(1.0)) end julia> dothis() 1.507047 seconds (206.04 k allocations: 11.071 MiB, 0.72% gc time) 22.0 julia> dothis() 0.012596 seconds (363 allocations: 19.527 KiB) 22.0 </code></pre> So though I have answered the OP's question, and the SharedArray is visible to all workers -- is this legitimate?

Shared array usage in Julia

Tags:

arrays

parallel-processing

julia

I need to parallelise a certain task over a number of workers. To that purpose I need all workers to have access to a matrix that stores the data.

I thought that the data matrix could be implemented as a Shared Array in order to minimise data movement.

In order to get me started with Shared Arrays, I am trying the following very simple example which gives me, what I think is, unexpected behaviour:

julia -p 2

# the data matrix
D = SharedArray(Float64, 2, 3)

# initialise the data matrix with dummy values
for ii=1:length(D)
   D[ii] = rand()
end

# Define some kind of dummy computation involving the shared array 
f = x -> x + sum(D)

# call function on worker
@time fetch(@spawnat 2 f(1.0))

The last command gives me the following error:

 ERROR: On worker 2:
 UndefVarError: D not defined
 in anonymous at none:1
 in anonymous at multi.jl:1358
 in anonymous at multi.jl:904
 in run_work_thunk at multi.jl:645
 in run_work_thunk at multi.jl:654
 in anonymous at task.jl:58
 in remotecall_fetch at multi.jl:731
 in call_on_owner at multi.jl:777
 in fetch at multi.jl:795

I thought that the Shared Array D should be visible to all workers? I am clearly missing something basic. Thanks in advance.

452

asked Mar 02 '16 15:03

user1438310

2 Answers

Although the underlying data is shared to all workers, the declaration of D is not. You will still need to pass in the reference to D, so something like

f = (x,SA) -> x + sum(SA) @time fetch(@spawnat 2 f(1.0,D))

should work. You can change D on the main process and see that it is infact using the same data:

julia> # call function on worker
       @time fetch(@spawnat 2 f(1.0,D))
  0.325254 seconds (225.62 k allocations: 9.701 MB, 5.88% gc time)
4.405613684678047

julia> D[1] += 1
1.2005544517241717

julia> # call function on worker
       @time fetch(@spawnat 2 f(1.0,D))
  0.004548 seconds (637 allocations: 45.490 KB)
5.405613684678047

answered Oct 25 '22 14:10

Daniel Arndt

This works, without declaring D, through a closure within a function.

function dothis()
    D = SharedArray{Float64}(2, 3)

    # initialise the data matrix with dummy values
    for ii=1:length(D)
       D[ii] = ii #not rand() anymore
    end

    # Define some kind of dummy computation involving the shared array 
    f = x -> x + sum(D)

    # call function on worker
    @time fetch(@spawnat 2 f(1.0))
end

julia> dothis()
1.507047 seconds (206.04 k allocations: 11.071 MiB, 0.72% gc time)
22.0
julia> dothis()
0.012596 seconds (363 allocations: 19.527 KiB)
22.0

So though I have answered the OP's question, and the SharedArray is visible to all workers -- is this legitimate?

answered Oct 25 '22 12:10

branden_burger

Related questions
                            
                                How transform this array?
                            
                                Sort array and reflect the changes in another array
                            
                                "Array.prototype.slice: 'this' is not a JavaScript object" error in IE8
                            
                                Initializing an array of records in VHDL
                            
                                extended initializer lists only available with
                            
                                perl: Can't use string as an ARRAY ref while "strict refs" in use
                            
                                load csv file to numpy and access columns by name
                            
                                Find array index if given value
                            
                                C++ array size declaration and const
                            
                                What is the best way to incrementally build a numpy array?
                            
                                store numpy array in mysql
                            
                                What is the memory layout of a vector of arrays?
                            
                                How 2D array is handled in HTTP POST request
                            
                                What the difference between array indexer and any other object indexer
                            
                                Numpy elementwise product of 3d array
                            
                                Generate all contiguous sequences from an array
                            
                                Why Are Zero Length VLAs UB?
                            
                                Convert javascript string to php array
                            
                                Valid parameters for astype in NumPy
                            
                                How to sort an array of ints and strings? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With