Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multi-computer makePSOCKcluster on Windows: Building a step-by-step guide

I've been trying to build a cluster using multiple computers for three days now and have failed spectacularly. So now I'm going to try to suck a bunch of you into solving my problem for me. If all goes well, I would hope we can generate a step-by-step guide to use as a reference to do this in the future, because as of yet, I haven't managed to find a decent reference for setting this up (perhaps it's too specific a task?)

In my case, let's assume Windows 7, with PuTTY as the SSH client, and 'localhost' is going to serve as the master.

Furthermore, let's assume only two computers on the same network for now. I imagine the process will generalize easily enough that if I can get it to work on two computers, I can get it to work on three. So we'll work on localhost and remote-computer.

Here's what I've gathered so far (with references linked at the bottom)

  1. Install PuTTY on localhost.
  2. Install PuTTY on remote-computer
  3. Install an SSH server on remote-computer
  4. Assign it a port to listen on? (I'm not sure about this step)
  5. Install R on localhost
  6. Install the same version of R on remote-computer
  7. Add R to the PATH environment variable on both localhost and remote-computer
  8. Run the R code below from localhost

code:

library(parallel)
cl <- makePSOCKcluster(c(rep("localhost", 2),
                         rep("remote-computer", 2)))

So far, I've done steps 1-3, not sure if I need to do 4, done 5-7, and the code for step 8 just hangs indefinitely.

When I check my SSH server logs, it doesn't appear that I'm hitting the SSH server from localhost. So it appears that my first problem is configuring the SSH correctly. Has anyone succeeded in doing this and would you be willing to share your expertise?

EDIT Oops: references http://www.milanor.net/blog/wp-content/uploads/2013/10/03.FirstStepinParallelComputing.pdf

R Parallel - connecting to remote cores

https://stat.ethz.ch/pipermail/r-sig-hpc/2010-October/000780.html

like image 950
Benjamin Avatar asked Sep 14 '15 19:09

Benjamin


1 Answers

At best, this is a partial answer. I'm still not establishing a cluster, but the steps described here are a pretty good record of how I've gotten to this point.

CONFIGURATIONS:

  1. Install PuTTY on 'remote-computer'
  2. Install SSH server on 'remote-computer'
  3. Install R on 'remote-computer' (Use the same version of R as on 'localhost')
  4. Add R to the PATH

  5. Install PuTTY on 'localhost'

  6. Install R on 'localhost'
  7. Add R to the PATH

TESTING THE CONNECTION: PHASE I

  1. From the command line, run

C:\PuTTYPath\plink.exe -pw [password] [username]@[remote_ip_address] Rscript -e rnorm(100)

(Confirm return of 100 normal random variates

  1. From the command line, run

C:\PuTTYPath\plink.exe -pw [password] [username]@[remoate_ip_address] RScript -e parallel:::.slaveRSOCK() MASTER=[local_ip_address] PORT=100501 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE

(Confirm that a session is started on the SSH server logs on 'remote-computer')

TESTING THE CONNECTION: PHASE II

  1. From an R Session, run

    system(paste0("C:/PuTTYPath/plink.exe -pw [password] ", "[username]@[remote_ip_address] ", "RScript -e rnorm(100)"))

    (Confirm return of 100 normal random variates)

  2. From an R session, run

    system(paste0("C:/PuTTY/plink.exe ", "-pw [password] ", "[username]@[remote_ip_address] ", "RScript -e parallel:::.slaveRSOCK() ", "MASTER=[local_ip_address] ", "PORT=100501 ", "OUT=/dev/null ", "TIMEOUT=2592000 ", "METHODS=TRUE ", "XDR=TRUE"))

(Confirm that a session is started and maintained on the SSH server logs on 'remote-computer'

ESTABLISH A CLUSTER

  1. From an R Session, run

    library(snow) cl <- makeCluster(spec = c("localhost", "[remote_ip_address]"), rshcmd = "C:/PuTTY/plink.exe -pw [password]", host = "[local_ip_address]")

(A session should be started and maintained on the SSH server logs on 'remote-computer'. Ideally, the function will complete at 'cl' be assigned)

Establishing the cluster is the point at which I'm failing. I run makeCluster and watch my SSH server logs. It shows a connection is made and then immediately closed. makeCluster never finishes running, cl is not assigned, and I'm stuck on how to go on. I'm not even sure if this is an R problem or a configuration problem at this point.

EDIT AND RESOLUTION:

For no good reason, I tried running this with the snow package, as shown in the "Establish a Cluster" section above. When I used the snow package, the cluster is built and runs stably. Not sure why I couldn't get this to work with the parallel package, but at least I've got something functional.

like image 67
Benjamin Avatar answered Nov 12 '22 16:11

Benjamin