I want to submit the job in a cluster environment with a timeout parameter, is there a way to make spark kill a running job if it exeeded the allowed duration?
At Spark 2.1.0, there is no built-in solution (a very good feature to add!).
You can play with speculation
feature to re-launch long task and spark.task.maxFailures
to kill too many re-launched tasks.
But this is absolutely not clean, Spark is missing a real "circuit breaker" to stop long task (such as the noob SELECT * FROM DB
)
In other side, you could use the Spark web UI web API:
1) Get running jobs: GET http://SPARK_CLUSTER_PROD/api/v1/applications/application_1502112083252_1942/jobs?status=running
(this will give you an array with submissionTime
field that you can use to find long jobs)
2) Kill the job: POST http://SPARK_CLUSTER_PROD/stages/stage/kill/?id=23881&terminate=true
for each job stages.
I believe Spark has a hidden API too, you can try to use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With