I'm trying to use remote processes in conjuntion with local processes, but when I do I get the following output
julia> addprocs(["user@host"], tunnel=true, dir="~/julia-79599ada44/bin/", sshflags=`-p 6969`)
id: cannot find name for group ID 350
1-element Array{Any,1}:
2
julia> addprocs(23)
fatal error on 2: ERROR: connect: host is unreachable (EHOSTUNREACH)
in wait at ./task.jl:284
in wait at ./task.jl:194
in stream_wait at stream.jl:263
in wait_connected at stream.jl:301
in Worker at multi.jl:113
in anonymous at task.jl:905
fatal error on fatal error on 5: 6: fatal error on fatal error on fatal error on 9: 14: 8: Worker 3 terminated.
...
I have tried adding the local processes first but I get the same errors when I add the remote ones.
I know the question is old, but I was asked today if I knew the answer of this unanswered question.
You could use the -p
along with the --machinefile
options:
Julia can be started in parallel mode with either the
-p
or the--machine-file
options.-p
n will launch an additional n worker processes, while--machine-file
file will launch a worker for each line in file file. The machines defined in file must be accessible via a password-less ssh login, with Julia installed at the same location as the current host. Each machine definition takes the form[count*][user@]host[:port] [bind_addr[:port]]
. user defaults to current user, port to the standard ssh port.count
is the number of workers to spawn on the node, and defaults to 1. The optional bind-tobind_addr[:port]
specifies the IP address and port that other workers should use to connect to this worker.
It has been a long time since I used the --machinefile
option, in my case the n
option didn't work and I don't know if it has been fixed, but you could add one line for each worker process you want instead, for example, if this doesn't work for you:
# machinefile.txt
23 user@host
You could try this:
# machinfile.txt
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
user@host
And then invoke julia like:
$ julia -p 2 --machinefile machinefile.txt
For a total of 25 processes (2 local and 23 remote).
But the n
option should work if it is documented, else please check if there is a bug and if not, open a new one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With