Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache flink on Kubernetes - Resume job if jobmanager crashes

I want to run a flink job on kubernetes, using a (persistent) state backend it seems like crashing taskmanagers are no issue as they can ask the jobmanager which checkpoint they need to recover from, if I understand correctly.

A crashing jobmanager seems to be a bit more difficult. On this flip-6 page I read zookeeper is needed to be able to know what checkpoint the jobmanager needs to use to recover and for leader election.

Seeing as kubernetes will restart the jobmanager whenever it crashes is there a way for the new jobmanager to resume the job without having to setup a zookeeper cluster?

The current solution we are looking at is: when kubernetes wants to kill the jobmanager (because it want to move it to another vm for example) and then create a savepoint, but this would only work for graceful shutdowns.

Edit: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-HA-with-Kubernetes-without-Zookeeper-td15033.html seems to be interesting but has no follow-up

like image 212
Richard Deurwaarder Avatar asked Aug 30 '18 20:08

Richard Deurwaarder


2 Answers

Out of the box, Flink requires a ZooKeeper cluster to recover from JobManager crashes. However, I think you can have a lightweight implementation of the HighAvailabilityServices, CompletedCheckpointStore, CheckpointIDCounter and SubmittedJobGraphStore which can bring you quite far.

Given that you have only one JobManager running at all times (not entirely sure whether K8s can guarantee this) and that you have a persistent storage location, you could implement a CompletedCheckpointStore which retrieves the completed checkpoints from the persistent storage system (e.g. reading all stored checkpoint files). Additionally, you would have a file which contains the current checkpoint id counter for CheckpointIDCounter and all the submitted job graphs for the SubmittedJobGraphStore. So the basic idea is to store everything on a persistent volume which is accessible by the single JobManager.

like image 143
Till Rohrmann Avatar answered Nov 07 '22 18:11

Till Rohrmann


I implemented a light version of file-based HA, based on Till's answer and Xeli's partial implementation.
You can find the code in this github repo - runs well in production.

Also wrote a blog series explaining how to run a job cluster on k8s in general and about this file-based HA implementation specifically.

like image 21
ronlut Avatar answered Nov 07 '22 18:11

ronlut