Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Stop running Spark Streaming application Gracefully?

How Do i stop spark streaming? My spark streaming job is running continuously. I want to stop in a graceful manner.

I have seen below option to shutdown streaming application.

sparkConf.set("spark.streaming.stopGracefullyOnShutdown","true") 

Spark configuration: available properties

But, how do i update this parameter on a running application?

like image 817
AKC Avatar asked Oct 12 '16 04:10

AKC


People also ask

Do spark streaming programs run continuously?

Users specify a streaming computation by writing a batch computation (using Spark's DataFrame/Dataset API), and the engine automatically incrementalizes this computation (runs it continuously).

What should be done to stop only the StreamingContext?

To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false. A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.

Is spark streaming deprecated?

Now that the Direct API of Spark Streaming (we currently have version 2.3. 2) is deprecated and we recently added the Confluent platform (comes with Kafka 2.2. 0) to our project we plan to migrate these applications.

How do I restart my spark streaming?

In the MQTT callback, stop the streaming context ssc. stop(true,true) which will gracefully shutdown the streams and underlying spark config. Start the spark application again by creating a spark conf and setting up the streams by reading the config file.


1 Answers

Have a look at this blogpost. It it the "nicest" way to gracefully terminate a streaming job I have come across.

How to pass Shutdown Signal :

Now we know how to ensure graceful shutdown in spark streaming. But how can we pass the shutdown signal to spark streaming. One naive option is to use CTRL+C command at the screen terminal where we run driver program but obviously its not a good option. One solution , which i am using is , grep the driver process of spark streaming and send a SIGTERM signal . When driver gets this signal, it initiates the graceful shutdown of the application. We can write the command as below in some shell script and run the script to pass shutdown signal :

ps -ef | grep spark | grep | awk '{print $2}' | xargs kill -SIGTERM

e.g. ps -ef | grep spark | grep DataPipelineStreamDriver | awk '{print $2}' | xargs kill -SIGTERM

like image 68
Glennie Helles Sindholt Avatar answered Oct 19 '22 03:10

Glennie Helles Sindholt