Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

H2O in Kubernetes

Tags:

kubernetes

h2o

Has anyone managed to run a H2O Cluster in Kubernetes?

I tried 2 options both using flatfile 1) using StatefulSet, but since the ip generated for the pod can change the cluster is unreliable 2) using a bunch of pairs of service/deployments and specifying the the flatfile the dns name of the service but the cluster doesn't start up correctly

none of the above work. Is there any way to make it work?

like image 457
Alessandro Magnani Avatar asked Mar 30 '17 06:03

Alessandro Magnani


People also ask

What is a Kubernete container?

Kubernetes is an open-source container management platform that unifies a cluster of machines into a single pool of compute resources. With Kubernetes, you organize your applications in groups of containers, which it runs using the Docker engine, taking care of keeping your application running as you request.

Why Kubernetes is called k8?

By the way, if you're wondering where the name “Kubernetes” came from, it is a Greek word, meaning helmsman or pilot. The abbreviation K8s is derived by replacing the eight letters of “ubernete” with the digit 8.

What is Kubernetes po?

A Kubernetes pod is a collection of one or more Linux® containers, and is the smallest unit of a Kubernetes application. Any given pod can be composed of multiple, tightly coupled containers (an advanced use case) or just a single container (a more common use case).

What is POD node and container?

Here's a quick list to understand this: Containers are packages of applications and execution environments. Pods are collections of closely-related or tightly coupled containers. Nodes are computing resources that house pods to execute workloads.


1 Answers

If multicast packets can be transmitted between the pods, then you could rely on that for the cluster formation. Just specify a unique -name for all the nodes to share. This is easy if it works, with no code changes.

UPDATE (2018/04/21) -- one of my colleagues says:

I used weave as the network layer, what that does is provide a connection between all the containers for that kubernetes pod group, then you dont need to use the flatfile in H2O, as h2o will multicast on startup, weave will take the multicast and send it to all instances of the pod.

in K8s run this: kubectl apply --filename https://git.io/weave-kube-1.6


If multicast is not an option, there isn't an out-of-the-box solution today for Kubernetes that I'm aware of.

You will need an orchestrator to distribute the flatfile information.

There are at least three examples of code to do this for other environments in the H2O github repos.

  1. ec2 scripts

https://github.com/h2oai/h2o-3/tree/master/ec2

  1. The hadoop driver

https://github.com/h2oai/h2o-3/blob/master/h2o-hadoop/h2o-mapreduce-generic/src/main/java/water/hadoop/h2omapper.java

In particular, look at how this class gets overridden:

https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/water/init/AbstractEmbeddedH2OConfig.java

  1. The sparkling water driver in the sparkling water repo.
like image 167
TomKraljevic Avatar answered Oct 24 '22 13:10

TomKraljevic