Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the use case for start(),awaitTermination() and stop() with regard to spark-streaming

I am new to spark-streaming. I am developing one application that fetches data from terminal and loads into HDFS. I searched over the internet but could not understand how to stop streaming application,once it is triggered?

Also would appreciate if you could explain me use case for sc.awaittermination() and sc.stop().

like image 921
Dave Avatar asked Jun 13 '16 13:06

Dave


People also ask

What is awaitTermination in spark Streaming?

awaitTermination (timeout=None)[source]¶ Waits for the termination of this query, either by query. stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown. If timeout is set, it returns whether the query has terminated or not within the timeout seconds.

Which of the following command is used to start Streaming in spark?

Define the streaming computations by applying transformation and output operations to DStreams. Start receiving data and processing it using streamingContext. start() .

What should be done to stop only Streaming context and not the spark context?

To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false. A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.

How does spark checkpoint Streaming work?

A checkpoint helps build fault-tolerant and resilient Spark applications. Spark Structured Streaming maintains an intermediate state on HDFS compatible file systems to recover from failures. To specify the checkpoint in a streaming query, we use the checkpointLocation parameter.


2 Answers

start - Till this point the actual execution of the code does not start. After start() the JobScheduler starts, this in turn starts JobGenerator which creates the jobs.

awaitTermination - It internally uses some condition variable which keeps a check on whether stop() was invoked explicitly in code or the application terminated (Ctrl+C).

like image 52
Prashant_M Avatar answered Oct 19 '22 22:10

Prashant_M


Stream queries are expected to run for a long time. Once a query is started(start), the executors will keep running, while the driver will be idle. To prevent the driver process from exiting, call awaitTermination; then when it is really time to stop the query, call stop.

like image 36
Fang Zhang Avatar answered Oct 19 '22 23:10

Fang Zhang