Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia: Parallel For loop with large data move

I want to run a parallel for loop. I need each of my processes to have access to 2 large dictionaries, gene_dict and transcript_dict. This is what I tried first

@everywhere( function EM ... end )

generefs  = [ @spawnat i genes for i in 2:nprocs()]
dict1refs = [ @spawnat i gene_dict for i in 2:nprocs()]
dict2refs = [ @spawnat i transcript_dict for i in 2:nprocs()]

result = @parallel (vcat) for i in 1:length(genes)
  EM(genes[i], gene_dict, transcript_dict)
end

but I get the following error on all processes (not just on 5):

exception on 5: ERROR: genes not defined
 in anonymous at no file:1514
 in anonymous at multi.jl:1364
 in anonymous at multi.jl:820
 in run_work_thunk at multi.jl:593
 in run_work_thunk at multi.jl:602
 in anonymous at task.jl:6
UndefVarError(:genes)

I thought @spawnat would move the three data structures I need to all of the processes. My first thought is maybe this move takes awhile and the parallel for loop tries to run before the data transfer is complete.

like image 413
bdeonovic Avatar asked Dec 26 '14 16:12

bdeonovic


1 Answers

The data is moved by @spawnat but it is not bound to variables with the same name as the name on the master node. Instead the data is saved in the fairly hidden Dict named Base.PGRP on the workers. To access the values, you'll have to fetch the RemoteRefs which in your case would be something like

result = @parallel (vcat) for i in 1:length(genes) EM(fetch(genes[i]), fetch(gene_dict[i]), fetch(transcript_dict[i])) end

like image 186
Andreas Noack Avatar answered Nov 15 '22 10:11

Andreas Noack