Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to connect another machine to standalone h2o installation to create a cluster?

Tags:

h2o

I want to try out H2O at home, on my commodity computers. How can I join them into the cluster?

Do I need to create a Hadoop cluster first?

Where can I find documentation, that could help me?

like image 247
Adam Ryczkowski Avatar asked Jan 06 '23 04:01

Adam Ryczkowski


1 Answers

No, a Hadoop cluster is not needed. Here is the documentation for starting nodes from the commandline. (I also found it useful to read the EC2 setup docs, and then browse through the EC2 scripts they supply.)

Basically you need to create a flatfile, which is a simple text file listing IP address and the port of each node in your cluster. You can give the cluster a name, and I like to name the flatfile with the same name, which will be "lantest.txt".

Then you need to get h2o.jar on each machine, and put your flatfile in the same directory (again, on each machine). Then start it on each machine with:

java -Xmx2G -ea -jar h2o.jar -name lantest -ip 192.168.x.y -port 54321 -flatfile lantest.txt

Keep that console window open, as log messages will be written to it.

Typically you change .x.y for each machine, but everything else stays the same. The -Xmx2G says I'm giving each machine 2GB; you might want to adjust that (but it must be exactly the same for every node.)

Something else that must be exactly the same is the version of h2o.jar: a minor version difference isn't good enough as it checks the md5 checksum!

The other thing you might struggle with is firewalls. Each node has to be able to see each other node on ports 54321 and 54322. So open those ports on the firewall on each machine. (On Windows, I also had to open access to Java.)

like image 196
Darren Cook Avatar answered May 22 '23 21:05

Darren Cook