How to manually commit offset in Spark Kafka direct streaming?

Tags:

apache-spark

I looked around hard but didn't find a satisfactory answer to this. Maybe I'm missing something. Please help.

We have a Spark streaming application consuming a Kafka topic, which needs to ensure end-to-end processing before advancing Kafka offsets, e.g. updating a database. This is much like building transaction support within the streaming system, and guaranteeing that each message is processed (transformed) and, more importantly, output.

I have read about Kafka DirectStreams. It says that for robust failure-recovery in DirectStreaming mode, Spark checkpointing should be enabled, which stores the offsets along with the checkpoints. But the offset management is done internally (setting Kafka config params like ["auto.offset.reset", "auto.commit.enable", "auto.offset.interval.ms"]). It does not speak of how (or if) we can customize committing offsets (once we've loaded a database, for e.g.). In other words, can we set "auto.commit.enable" to false and manage the offsets (not unlike a DB connection) ourselves?

Any guidance/help is greatly appreciated.

453

asked Jul 28 '16 11:07

TroubleShooter

1 Answers

The article below could be a good start to understand the approach.

spark-kafka-achieving-zero-data-loss

Further more,

The article suggests using zookeeper client directly, which can be replaced by something like KafkaSimpleConsumer also. The advantage of using Zookeper/KafkaSimpleConsumer is the monitoring tools that depend on Zookeper saved offset. Also the information can also be saved on HDFS or any other reliable service.

160

answered Nov 09 '22 03:11

rakesh

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to manually commit offset in Spark Kafka direct streaming?

Tags:

apache-kafka

apache-spark

TroubleShooter

People also ask

1 Answers

rakesh

Recent Activity

Donate For Us

How to manually commit offset in Spark Kafka direct streaming?

Tags:

apache-kafka

apache-spark

TroubleShooter

People also ask

1 Answers

rakesh

Related questions

Recent Activity

Donate For Us