When does Dataflow acknowledge a message of batched items from PubSubIO?

Tags:

google-cloud-dataflow

There has been a question on this topic, the answer said "The acknowledgement will be made once the message is durable persisted somewhere in the Dataflow pipeline.".

Conceptually, that makes sense, but I am not sure how Dataflow is capable of tracking a message after it has been deserialized and transformed in the pipeline before its payload is persisted.

In our case, the PubSub message contains a batch of items. After the message is received and deserialized, we broken down the batch for processing. Eventually, an item in the batch could be either discarded or committed to Datastore depending on its timestamp.

How does the acknowledgement work in this situation?

665

asked Jan 18 '17 18:01

M Song

1 Answers

Dataflow executes your code in bundles. After successful execution each bundle is committed to avoid re-execution on successfully processed elements. Bundles are not necessarily committed between every step in the pipeline. See the description of fusion optimization for details about when PCollections are materialized and committed.

For PubSub, messages that were read as part of a bundle will be acknowledged as part of committing the completion of that bundle. This means if you look at the PubSub read step, and any ParDos after it, these will be executed (and committed) together.

Adding a GroupByKey after the PubSub read allows messages to be acknowledged to PubSub as soon as the bundles are committed to the GroupByKey.

126

answered Jan 17 '23 12:01

Ben Chambers

Related questions
                            
                                Validating rows before inserting into BigQuery from Dataflow
                            
                                Coder issues with Apache Beam and CombineFn
                            
                                What's the difference between "serverless" and "fully managed"? [closed]
                            
                                Java/Dataflow - Unable to use ClassLoader to detect classpath elements
                            
                                Start CloudSQL Proxy on Python Dataflow / Apache Beam
                            
                                Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow
                            
                                Error with installing apache-beam[gcp] on mac zsh terminal - “zsh: no matches found: apache-beam[gcp]”
                            
                                What does object of type '_UnwindowedValues' has no len() mean?
                            
                                Architecture of complex Dataflow jobs
                            
                                Network default is not accessible to Dataflow Service account
                            
                                Apache Beam - Integration test with unbounded PCollection
                            
                                Kotlin Iterable not supported in Apache Beam?
                            
                                How to calculate the cost of a Google dataflow?
                            
                                How do I restart a cancelled Cloud Dataflow streaming job?
                            
                                Windowing with Apache Beam - Fixed Windows Don't Seem to be Closing?
                            
                                Writing different values to different BigQuery tables in Apache Beam
                            
                                How to convert csv into a dictionary in apache beam dataflow
                            
                                Apache Beam in Dataflow Large Side Input
                            
                                Partition data coming from CSV so I can process larger patches rather then individual lines

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With