Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does Dataflow acknowledge a message of batched items from PubSubIO?

There has been a question on this topic, the answer said "The acknowledgement will be made once the message is durable persisted somewhere in the Dataflow pipeline.".

Conceptually, that makes sense, but I am not sure how Dataflow is capable of tracking a message after it has been deserialized and transformed in the pipeline before its payload is persisted.

In our case, the PubSub message contains a batch of items. After the message is received and deserialized, we broken down the batch for processing. Eventually, an item in the batch could be either discarded or committed to Datastore depending on its timestamp.

How does the acknowledgement work in this situation?

like image 665
M Song Avatar asked Jan 18 '17 18:01

M Song


People also ask

What are the three kinds of streaming windows discussed?

Tumbling windows (called fixed windows in Apache Beam) Hopping windows (called sliding windows in Apache Beam) Session windows.

What is a dataflow template?

Dataflow templates allow you to package a Dataflow pipeline for deployment. Anyone with the correct permissions can then use the template to deploy the packaged pipeline. You can create your own custom Dataflow templates, and Google provides pre-built templates for common scenarios.


1 Answers

Dataflow executes your code in bundles. After successful execution each bundle is committed to avoid re-execution on successfully processed elements. Bundles are not necessarily committed between every step in the pipeline. See the description of fusion optimization for details about when PCollections are materialized and committed.

For PubSub, messages that were read as part of a bundle will be acknowledged as part of committing the completion of that bundle. This means if you look at the PubSub read step, and any ParDos after it, these will be executed (and committed) together.

Adding a GroupByKey after the PubSub read allows messages to be acknowledged to PubSub as soon as the bundles are committed to the GroupByKey.

like image 126
Ben Chambers Avatar answered Jan 17 '23 12:01

Ben Chambers