Perform action after Dataflow pipeline has processed all data

Tags:

google-cloud-dataflow

Is it possible to perform an action once a batch Dataflow job has finished processing all data? Specifically, I'd like to move the text file that the pipeline just processed to a different GCS bucket. I'm not sure where to place that in my pipeline to ensure it executes once after the data processing has completed.

852

asked Jun 01 '17 19:06

user01380121

2 Answers

I don't see why you need to do this post pipeline execution. You could use side outputs to write the file to multiple buckets, and save yourself the copy after the pipeline finishes.

If that's not going to work for you (for whatever reason), then you can simply run your pipeline in blocking execution mode i.e. use pipeline.run().waitUntilFinish(), and then just write the rest of your code (which does the copy) after that.

[..]
// do some stuff before the pipeline runs
Pipeline pipeline = ...
pipeline.run().waitUntilFinish();
// do something after the pipeline finishes here
[..]

138

answered Sep 24 '22 19:09

Graham Polley

A little trick I got from reading the source code of apache beam's PassThroughThenCleanup.java.

Right after your reader, create a side input that 'combine' the entire collection (in the source code, it is the View.asIterable() PTransform) and connect its output to a DoFn. This DoFn will be called only after the reader has finished reading ALL elements.

P.S. The code literally name the operation, cleanupSignalView which I found really clever

Note that you can achieve the same effect using Combine.globally() (java) or beam.CombineGlobally() (python). For more info check out section 4.2.4.3 here

answered Sep 21 '22 19:09

Joshua H

Related questions
                            
                                Is there a way to trigger a Google Cloud Function with a Google Datastore event?
                            
                                How do I use Cloud DataStore or Cloud SQL from Cloud Functions for Firebase?
                            
                                Is there a shorter way to extract hour from timestamp stored as integer than EXTRACT(HOUR FROM TIMESTAMP_SECONDS(visitStartTime))?
                            
                                Enabling SSL on Flask + Google App Engine
                            
                                Firebase Auth: Invalid email address crashing node.js (Express) app
                            
                                BigQuery UI when logged into multiple Google Accounts
                            
                                Vuex and Firebase store state errors
                            
                                How to pull notebooks from github to google cloud datalab?
                            
                                How to increase ReadTimeout in Google HTTP Client
                            
                                Google Speech API from the browser
                            
                                Google Cloud SDK Installation: Unzip Failed: Error opening zip file
                            
                                How can I see Firebase Storage Usage in the Console
                            
                                Data Studio query error when using Big Query view that joins tables
                            
                                How often does Firebase phone authentication require verification?
                            
                                Redefinition of module 'Firebase' in Swift 3
                            
                                Google Cloud Pubsub Data lost
                            
                                No visible @interface for 'FIRInstanceID' declares the selector 'setAPNSToken:type:'
                            
                                Structuring Relationships in Firebase
                            
                                Android google analytics integration error
                            
                                Could not load the default credentials? (Node.js Google Compute Engine tutorial)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With