Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Start multiple h2o cluster from within R

My intention is to start two or more h2o clusters / instances (not two or more nodes!) from within R on the same computer/server to enable multiple user to connect with h2o at the same time. In addition, I want to be able to shutdown and restart clusters separately, also from within R.

I already know that I cannot controll multiple h2o clusters simply from within R, thus I tried to start two clusters from the command line in Windows 10:

java -Xmx1g -jar h2o.jar -name testCluster1 -nthreads 1  -port 54321
java -Xmx1g -jar h2o.jar -name testCluster2 -nthreads 1  -port 54323

This works fine for me:

library(h2o)

h2o.init(startH2O = FALSE, ip = "localhost", port = 54321) 
Connection successful!

R is connected to the H2O cluster: 
H2O cluster uptime:         4 minutes 8 seconds 
H2O cluster version:        3.8.3.2 
H2O cluster name:           testCluster 
H2O cluster total nodes:    1 
H2O cluster total memory:   0.87 GB 
H2O cluster total cores:    4 
H2O cluster allowed cores:  1 
H2O cluster healthy:        TRUE 
H2O Connection ip:          localhost 
H2O Connection port:        54321 
H2O Connection proxy:       NA 
R Version:                  R version 3.2.5 (2016-04-14) 

h2o.init(startH2O = FALSE, ip = "localhost", port = 54323)
Connection successful!

R is connected to the H2O cluster: 
H2O cluster uptime:         3 minutes 32 seconds 
H2O cluster version:        3.8.3.2 
H2O cluster name:           testCluster2 
H2O cluster total nodes:    1 
H2O cluster total memory:   0.87 GB 
H2O cluster total cores:    4 
H2O cluster allowed cores:  1 
H2O cluster healthy:        TRUE 
H2O Connection ip:          localhost 
H2O Connection port:        54323 
H2O Connection proxy:       NA 
R Version:                  R version 3.2.5 (2016-04-14) 

Now, I want to do the same from within R via the system() command.

launchH2O <-  as.character("java -Xmx1g -jar h2o.jar -name testCluster -nthreads 1  -port 54321")
system(command = launchH2O, intern =TRUE)

But I get an error message:

[1] "Error: Unable to access jarfile h2o.jar"
attr(,"status")
[1] 1
Warning message:
running command 'java -Xmx1g -jar h2o.jar -name testCluster -nthreads 1  -port 54321' had status 1 

Trying

system2(command = launchH2O)

I get a warning message and I am not able to connect with the cluster:

system2(command = launchH2O)
Warning message:
running command '"java -Xmx1g -jar h2o.jar -name testCluster -nthreads 1  -port 54321"' had status 127 

h2o.init(startH2O = FALSE, ip = "localhost", port = 54321)
Error in h2o.init(startH2O = FALSE, ip = "localhost", port = 54321) : 
Cannot connect to H2O server. Please check that H2O is running at http://localhost:54321/

Any ideas how to start / shutdown two or more h2o clusters from within R? Thank you in advance!

Note 1: I am only using my local Windows device for testing, I actually want to create multiple h2o clusters on a Linux server.

Note 2: I tried it with both R GUI (3.2.5) and R Studio (Version 0.99.892) and I ran them as admin. The h2o.jar file is in my working directory and my Java version is (Build 1.8.0_91-b14).

Note 3: System information: - h2o & h2o R package version: 3.8.3.2 - Windows 10 Home, Version 1511 - 16 RAM, Intel Core i5-6200U CPU with 2,30 GHz

like image 499
constiii Avatar asked Jul 18 '16 15:07

constiii


People also ask

How do I run H2O in R?

H2O requires Java; if you do not already have Java installed, install it from https://java.com/en/download/ before installing H2O. To use H2O with R, start H2O outside of R and connect to it, or launch H2O from R. However, if you launch H2O from R and close the R session, the H2O session closes as well.

What does H2O init do?

By default, h2o. init() first checks if an H2O instance is connectible. If it cannot connect and start = TRUE with ip = "localhost" , it will attempt to start an instance of H2O at localhost:54321.

What is an H2O cluster?

It basically means all the computations, data and everything involved in machine learning happens in the distributed memory of the H2O cluster itself. You can think of a cluster like a bunch of nodes, sharing memory and computation. A Node could be a server, an EC2 instance, or your laptop.


1 Answers

EDIT: I've changed to intern=FALSE, in below examples, based on comments


You should just need to change directory; it is either that or not setting wait=FALSE (to run the command in the background).

launchH2O <- "java -Xmx1g -jar h2o.jar -name testCluster -nthreads 1 -port 54321"
savewd <- setwd("/path/to/h2ojar/")
system(command = launchH2O, intern =FALSE wait=FALSE)
setwd(savewd)

The last line, and the assignment to savewd is just to preserve working directory. Alternatively this should also work:

launchH2O <- "java -Xmx1g -jar /path/to/h2ojar/h2o.jar -name testCluster -nthreads 1 -port 54321"
system(command = launchH2O, intern =FALSE, wait=FALSE)

When on Linux, there is another way:

launchH2O <- "bash -c 'nohup java -Xmx1g -jar /path/to/h2ojar/h2o.jar -name testCluster -nthreads 1 -port 54321 &'"
system(command = launchH2O, intern =FALSE)

(Because the last command explicitly puts it in the background, I don't think you need to set wait=FALSE.)

like image 64
Darren Cook Avatar answered Sep 26 '22 04:09

Darren Cook