When submitting spark streaming program using spark-submit(YARN mode) it keep polling the status and never exit
Is there any option in spark-submit to exit after the submission?
===why this trouble me===
The streaming program will run forever and i don't need the status update
I can ctrl+c to stop it if i start it manually but i have lots of streaming context to start and i need to start them using script
I can put the spark-submit program in background, but after lots of background java process created, the user corresponding to, will not able to run any other java process because JVM cannot create GC thread
Once you do a Spark submit, a driver program is launched and this requests for resources to the cluster manager and at the same time the main program of the user function of the user processing program is initiated by the driver program.
You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.
In general, if you want a process to keep running you can create a process file that will run in the background. in your case, the job will continue running until you specifically kill it using yarn -kill. so even if you kill the spark submit it will continue to run since yarn is managing it after submission.
I know this is an old question but there's a way to do this now by setting --conf spark.yarn.submit.waitAppCompletion=false
when you're using spark-submit
. With this the client will exit after successfully submitting the application.
In YARN cluster mode, controls whether the client waits to exit until the application completes. If set to true, the client process will stay alive reporting the application's status. Otherwise, the client process will exit after submission.
Also, you may need to set --deploy-mode
to cluster
In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.
More at https://spark.apache.org/docs/latest/running-on-yarn.html
Interesting. I never thought about this issue. Not sure there is a clean way to do this, but I simply kill the submit process on the machine and the yarn job continues to run until you stop it specifically. So you can create a script that execute the spark submit and then kills it. When you will actually wanna stop the job use yarn -kill. Dirty but works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With