How to achieve JobManager High Availability in a Kubernetes Flink Cluster?

Question

The Flink official documentation provides a jobmanager high availability solution for Standalone And Yarn Flink clusters. But what should be done for high availability using a Kubernetes Flink Cluster?

From the Kubernetes Setup section of the document, it seems we just deploy a single Jobmanager when deploying to a Kubernetes cluster. So how to achieve HA for a Kubernetes Flink Cluster?

Ryan Dawson · Accepted Answer

The official doc says that high availability for the job manager is to deal with cases where the job manager crashes. So there is only a single job manager needed but you want to handle the case where it goes down. On Kubernetes if it goes down then Kubernetes should detect this and automatically restart it. So you don't need to run more replicas of it.

(The doc says this explicitly about using yarn for ha. It doesn't seem to state it for Kubernetes but restarting failing Pods is standard behaviour for Kubernetes.)

The task manager is configured by default to run with multiple replicas on Kubernetes in the official k8s resources (see the 'replicas' entries in the resources) but the job manager is not. (And it's the same in the helm chart.) So I believe it is not needed for the job manager - I'd suggest running with a single job-manager unless you hit specific problems with that.

How to achieve JobManager High Availability in a Kubernetes Flink Cluster?

Tags:

kubernetes

apache-flink

YuFeng Shen

1 Answers

Ryan Dawson

Recent Activity

Donate For Us

How to achieve JobManager High Availability in a Kubernetes Flink Cluster?

Tags:

kubernetes

apache-flink

YuFeng Shen

1 Answers

Ryan Dawson

Related questions

Recent Activity

Donate For Us