Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

At what stage does Dataflow/Apache Beam ack a pub/sub message?

I have a dataflow streaming job with Pub/Sub subscription as an unbounded source. I want to know at what stage does dataflow acks the incoming pub/sub message. It appears to me that the message is lost if an exception is thrown during any stage of the dataflow pipeline.

Also I'd like to know how to the best practices for writing dataflow pipeline with pub/sub unbounded source for message retrieval on failure. Thank you!

like image 769
Kakaji Avatar asked Oct 13 '17 02:10

Kakaji


People also ask

What is ACK in pub sub?

If a subscription exists for the topic prior to the message is received by the topic, subscribers can pull this message from the subscription. A predetermined amount of time (this defaults to 10 seconds) is allocated for the client to acknowledge (ack) this message before the server redelivers it.

How do you check pub/sub messages?

Pull the messages from the subscriptionIn the Google Cloud console, go to the Pub/Sub subscriptions page. In the Messages tab, click Pull .


1 Answers

The Dataflow Streaming Runner acks pubsub messages received by a bundle after the bundle has succeeded and results of the bundle (outputs and state mutations etc) have been durably committed. Failed bundles are retried until they succeed, and don't cause data loss. If you believe that data loss may be happening, please include details (job id and your reasoning that lead you to conclude that data has been dropped because of the failures) and we'll investigate.

like image 66
jkff Avatar answered Sep 20 '22 05:09

jkff