I am using makeCluster
function from R package snow
from Linux machine to start a SOCK cluster on a remote Linux machine. All seems settled for the two machines to communicate succesfully (I am able to estabilish ssh connections between the two). But:
makeCluster("192.168.128.24",type="SOCK")
does not throw any result, just hangs indefinitely.
What am I doing wrong?
Thanks a lot
Unfortunately, there are a lot of things that can go wrong when creating a snow (or parallel) cluster object, and the most common failure mode is to hang indefinitely. The problem is that makeSOCKcluster
launches the cluster workers one by one, and each worker (if successfully started) must make a socket connection back to the master before the master proceeds to launch the next worker. If any of the workers fail to connect back to the master, makeSOCKcluster
will hang without any error message. The worker may issue an error message, but by default any error message is redirected to /dev/null
.
In addition to ssh problems, makeSOCKcluster
could hang because:
and there are many more possibilities.
In other words, no one can diagnose this problem without further information, so you have to do some troubleshooting in order to get that information.
In my experience, the single most useful troubleshooting technique is manual mode which you enable by specifying manual=TRUE
when creating the cluster object. It's also a good idea to set outfile=""
so that error messages from the workers aren't redirected to /dev/null
:
cl <- makeSOCKcluster("192.168.128.24", manual=TRUE, outfile="")
makeSOCKcluster
will display an Rscript command to execute in a terminal on the specified machine, and then it will wait for you to execute that command. In other words, makeSOCKcluster will hang until you manually start the worker on host 192.168.128.24, in your case. Remember that this is a troubleshooting technique, not a solution to the problem, and the hope is to get more information about why the workers aren't starting by trying to start them manually.
Obviously, the use of manual mode bypasses any ssh issues (since you're not using ssh), so if you can create a SOCK cluster successfully in manual mode, then probably ssh is your problem. If the Rscript command isn't found, then either R isn't installed, or it's installed in a different location. But hopefully you'll get some error message that will lead you to the solution.
If makeSOCKcluster
still just hangs after you've executed the specified Rscript command on the specified machine, then you probably have a networking or firewall issue.
For more troubleshooting advice, see my answer for making cluster in doParallel / snowfall hangs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With