Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data management in a parallel for-loop in Julia

I'm trying to do some statistical analysis using Julia. The code consists of the files script.jl (e.g. initialisation of the data) and algorithm.jl.

The number of simulations is large (at least 100,000) so it makes sense to use parallel processing.

The code below is just some pseudocode to illustrate my question —

function script(simulations::Int64)

# initialise input data
...

# initialise other variables for statistical analysis using zeros()
...

require("algorithm.jl")

@parallel for z = 1:simulations
  while true

    choices = algorithm(data);      

    if length(choices) == 0
      break
    else
      # process choices and pick one (which alters the data)
      ...
    end

  end
end

# display results of statistical analysis
...

end

and

function algorithm(data)

# actual algorithm
...

return choices;

end

As example, I would like to know how many choices there are on average, what is the most common choice, and so on. For this purpose I need to save some data from choices (in the for-loop) to the statistical analysis variables (initialised before the for-loop) and display the results (after the for-loop).

I've read about using @spawn and fetch() and functions like pmap() but I'm not sure how I should proceed. Just using the variables inside the for-loop does not work as each proc gets its own copy, so the values of the statistical analysis variables after the for-loop will just be zeros.

[Edit] In Julia I use include("script.jl") and script(100000) to run the simulations, there are no issues when using a single proc. However, when using multiple procs (e.g. using addprocs(3)) all statistical variables are zeros after the for-loop — which is to be expected.

like image 539
Ailurus Avatar asked Nov 10 '22 18:11

Ailurus


1 Answers

It seems that you want to parallelize an inherently serial operations, because each operation is related to the result of another one (in this case data). I think if you could implement the above code like:

@parallel (dosumethingwithdata) for z = 1:simulations
  while true

    choices = algorithm(data,z);      

    if length(choices) == 0
      break
    else
      # process choices and pick one (which alters the data)
      ...
    end

    data

  end
end

then you may find a parallel solution for the problem.

like image 196
Reza Afzalan Avatar answered Nov 15 '22 06:11

Reza Afzalan