Is it possible to implement a reliable receiver which supports non-graceful shutdown?

Question

I'm curious if it is an absolute must that a Spark streaming application is brought down gracefully or it runs the risk of causing duplicate data via the write-ahead log. In the below scenario I outline sequence of steps where a queue receiver interacts with a queue requires acknowledgements for messages.

Spark queue receiver pulls a batch of messages from the queue.
Spark queue receiver stores the batch of messages into the write-ahead log.
Spark application is terminated before an ack is sent to the queue.
Spark application starts up again.
The messages in the write-ahead log are processed through the streaming application.
Spark queue receiver pulls a batch of messages from the queue which have already been seen in step 1 because they were not acknowledged as received.
...

Is my understanding correct on how custom receivers should be implemented, the problems of duplication that come with it, and is it normal to require a graceful shutdown?

ofirski · Accepted Answer

Bottom line: It depends on your output operation.

Using the Direct API approach, which was introduced on V1.3, eliminates inconsistencies between Spark Streaming and Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures because offsets are tracked by Spark Streaming within its checkpoints.

In order to achieve exactly-once semantics for output of your results, your output operation that saves the data to an external data store must be either idempotent, or an atomic transaction that saves results and offsets.

For further information on the Direct API and how to use it, check out this blog post by Databricks.

Is it possible to implement a reliable receiver which supports non-graceful shutdown?

Tags:

apache-spark

spark-streaming

stphung

1 Answers

ofirski

Recent Activity

Donate For Us

Is it possible to implement a reliable receiver which supports non-graceful shutdown?

Tags:

apache-spark

spark-streaming

stphung

1 Answers

ofirski

Related questions

Recent Activity

Donate For Us