Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit apache spark job running duration

Tags:

apache-spark

I want to submit the job in a cluster environment with a timeout parameter, is there a way to make spark kill a running job if it exeeded the allowed duration?

like image 432
Gem741 Avatar asked Nov 09 '22 02:11

Gem741


1 Answers

At Spark 2.1.0, there is no built-in solution (a very good feature to add!).

You can play with speculation feature to re-launch long task and spark.task.maxFailures to kill too many re-launched tasks.

But this is absolutely not clean, Spark is missing a real "circuit breaker" to stop long task (such as the noob SELECT * FROM DB)

In other side, you could use the Spark web UI web API:

1) Get running jobs: GET http://SPARK_CLUSTER_PROD/api/v1/applications/application_1502112083252_1942/jobs?status=running

(this will give you an array with submissionTime field that you can use to find long jobs)

2) Kill the job: POST http://SPARK_CLUSTER_PROD/stages/stage/kill/?id=23881&terminate=true for each job stages.

I believe Spark has a hidden API too, you can try to use.

like image 175
Thomas Decaux Avatar answered Nov 15 '22 13:11

Thomas Decaux