Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamic bigquery query in dataflow template

I've written a Dataflow job that works great when I run it manually. Here is the relevant section (with some validation code removed for clarity):

parser.add_argument('--end_datetime',
                    dest='end_datetime')
known_args, pipeline_args = parser.parse_known_args(argv)

query = <redacted SQL String with a placeholder for a date>
query = query.replace('#ENDDATETIME#', known_args.end_datetime)

with beam.Pipeline(options=pipeline_options) as p:
    rows = p | 'read query' >> beam.io.Read(beam.io.BigQuerySource(query=query, use_standard_sql=True))

Now I want to create a template and schedule it to run on a regular basis with a dynamic ENDDATETIME. As I understand it, in order to do this I need to change add_argument to add_value_provider_argument per this documentation:

https://cloud.google.com/dataflow/docs/templates/creating-templates

Unfortunately, it appears that ValueProvider values are not available when I need them, they're only available inside the pipeline itself. (please correct me if I'm wrong here...). So I'm kind of stuck.

Does anyone have any pointers on how I could get a dynamic date into my query in a Dataflow template?

like image 764
Mike Keyes Avatar asked Oct 05 '17 21:10

Mike Keyes


People also ask

Can cloud dataflow send data to BigQuery?

We are now making support for the Storage Write API in Dataflow available by providing two additional methods to the BigQueryIO connector. You have a choice of using a method with exactly-once semantics of inserting data into BigQuery or a lower latency and potentially cheaper method with at-least-once semantics.

What is dataflow template in GCP?

Dataflow templates allow you to package a Dataflow pipeline for deployment. Anyone with the correct permissions can then use the template to deploy the packaged pipeline. You can create your own custom Dataflow templates, and Google provides pre-built templates for common scenarios.

What is Flex template?

Flex templates are designed and developed with an array of content modules that can be added, customized, and rearranged on a page. These modules can range from being content blocks for testimonials, image gallery, general content, or whatever a website might need.


1 Answers

Python currently only supports ValueProvider options for FileBasedSource IOs. You can see that by clicking on the Python tab at the link you used: https://cloud.google.com/dataflow/docs/templates/creating-templates

under the "Pipeline I/O and runtime parameters" section.

like image 50
María García Herrero Avatar answered Oct 10 '22 19:10

María García Herrero