Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to exit spark-submit after the submission

When submitting spark streaming program using spark-submit(YARN mode) it keep polling the status and never exit

Is there any option in spark-submit to exit after the submission?

===why this trouble me===

The streaming program will run forever and i don't need the status update

I can ctrl+c to stop it if i start it manually but i have lots of streaming context to start and i need to start them using script

I can put the spark-submit program in background, but after lots of background java process created, the user corresponding to, will not able to run any other java process because JVM cannot create GC thread

like image 870
Peter Chan Avatar asked May 13 '16 02:05

Peter Chan


People also ask

What happens after submitting spark job?

Once you do a Spark submit, a driver program is launched and this requests for resources to the cluster manager and at the same time the main program of the user function of the user processing program is initiated by the driver program.

How do I run spark submit in client mode?

You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.

How do I run a spark job in the background?

In general, if you want a process to keep running you can create a process file that will run in the background. in your case, the job will continue running until you specifically kill it using yarn -kill. so even if you kill the spark submit it will continue to run since yarn is managing it after submission.


2 Answers

I know this is an old question but there's a way to do this now by setting --conf spark.yarn.submit.waitAppCompletion=false when you're using spark-submit. With this the client will exit after successfully submitting the application.

In YARN cluster mode, controls whether the client waits to exit until the application completes. If set to true, the client process will stay alive reporting the application's status. Otherwise, the client process will exit after submission.

Also, you may need to set --deploy-mode to cluster

In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.

More at https://spark.apache.org/docs/latest/running-on-yarn.html

like image 93
Mateusz Dymczyk Avatar answered Sep 21 '22 04:09

Mateusz Dymczyk


Interesting. I never thought about this issue. Not sure there is a clean way to do this, but I simply kill the submit process on the machine and the yarn job continues to run until you stop it specifically. So you can create a script that execute the spark submit and then kills it. When you will actually wanna stop the job use yarn -kill. Dirty but works.

like image 32
z-star Avatar answered Sep 22 '22 04:09

z-star