I want to try out H2O at home, on my commodity computers. How can I join them into the cluster?
Do I need to create a Hadoop cluster first?
Where can I find documentation, that could help me?
No, a Hadoop cluster is not needed. Here is the documentation for starting nodes from the commandline. (I also found it useful to read the EC2 setup docs, and then browse through the EC2 scripts they supply.)
Basically you need to create a flatfile, which is a simple text file listing IP address and the port of each node in your cluster. You can give the cluster a name, and I like to name the flatfile with the same name, which will be "lantest.txt".
Then you need to get h2o.jar on each machine, and put your flatfile in the same directory (again, on each machine). Then start it on each machine with:
java -Xmx2G -ea -jar h2o.jar -name lantest -ip 192.168.x.y -port 54321 -flatfile lantest.txt
Keep that console window open, as log messages will be written to it.
Typically you change .x.y
for each machine, but everything else stays the same. The -Xmx2G
says I'm giving each machine 2GB; you might want to adjust that (but it must be exactly the same for every node.)
Something else that must be exactly the same is the version of h2o.jar: a minor version difference isn't good enough as it checks the md5 checksum!
The other thing you might struggle with is firewalls. Each node has to be able to see each other node on ports 54321 and 54322. So open those ports on the firewall on each machine. (On Windows, I also had to open access to Java.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With