I have a dataflow streaming job with Pub/Sub subscription as an unbounded source. I want to know at what stage does dataflow acks the incoming pub/sub message. It appears to me that the message is lost if an exception is thrown during any stage of the dataflow pipeline.
Also I'd like to know how to the best practices for writing dataflow pipeline with pub/sub unbounded source for message retrieval on failure. Thank you!
If a subscription exists for the topic prior to the message is received by the topic, subscribers can pull this message from the subscription. A predetermined amount of time (this defaults to 10 seconds) is allocated for the client to acknowledge (ack) this message before the server redelivers it.
Pull the messages from the subscriptionIn the Google Cloud console, go to the Pub/Sub subscriptions page. In the Messages tab, click Pull .
The Dataflow Streaming Runner acks pubsub messages received by a bundle after the bundle has succeeded and results of the bundle (outputs and state mutations etc) have been durably committed. Failed bundles are retried until they succeed, and don't cause data loss. If you believe that data loss may be happening, please include details (job id and your reasoning that lead you to conclude that data has been dropped because of the failures) and we'll investigate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With