Could someone please share syntax to read/write bigquery table in a pipeline written in python for GCP Dataflow

Run on Dataflow First, construct a <code>Pipeline</code> with the following options for it to run on GCP DataFlow: <pre class="prettyprint"><code>import apache_beam as beam options = {'project': <project>, 'runner': 'DataflowRunner', 'region': <region>, 'setup_file': <setup.py file>} pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options) pipeline = beam.Pipeline(options = pipeline_options) </code></pre> Read from BigQuery Define a <code>BigQuerySource</code> with your query and use <code>beam.io.Read</code> to read data from BQ: <pre class="prettyprint"><code>BQ_source = beam.io.BigQuerySource(query = <query>) BQ_data = pipeline | beam.io.Read(BQ_source) </code></pre> Write to BigQuery There are two options to write to bigquery: <ul> <li> use a <code>BigQuerySink</code> and <code>beam.io.Write</code>: <pre class="prettyprint"><code>BQ_sink = beam.io.BigQuerySink(<table>, dataset=<dataset>, project=<project>) BQ_data | beam.io.Write(BQ_sink) </code></pre> </li> <li> use <code>beam.io.WriteToBigQuery</code>: <pre class="prettyprint"><code>BQ_data | beam.io.WriteToBigQuery(<table>, dataset=<dataset>, project=<project>) </code></pre> </li> </ul>

How to read BigQuery table using python pipeline code in GCP Dataflow

1 Answers

Run on Dataflow

First, construct a Pipeline with the following options for it to run on GCP DataFlow:

import apache_beam as beam

options = {'project': <project>,
           'runner': 'DataflowRunner',
           'region': <region>,
           'setup_file': <setup.py file>}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
pipeline = beam.Pipeline(options = pipeline_options)

Read from BigQuery

Define a BigQuerySource with your query and use beam.io.Read to read data from BQ:

BQ_source = beam.io.BigQuerySource(query = <query>)
BQ_data = pipeline | beam.io.Read(BQ_source)

Write to BigQuery

There are two options to write to bigquery:

use a BigQuerySink and beam.io.Write:

BQ_sink = beam.io.BigQuerySink(<table>, dataset=<dataset>, project=<project>)
BQ_data | beam.io.Write(BQ_sink)

use beam.io.WriteToBigQuery:

BQ_data | beam.io.WriteToBigQuery(<table>, dataset=<dataset>, project=<project>)

170

answered Sep 23 '22 21:09

Robbe

Related questions
                            
                                How to make new decorators available within a class without explicitly importing them?
                            
                                Googleapiclient and python3
                            
                                How to read the contents of a csv file into a class with each csv row as a class instance
                            
                                Translate using dictionaries
                            
                                Cuda GPU is slower than CPU in simple numpy operation
                            
                                How can I select a html element no matter what frame it is in in selenium?
                            
                                Python passing self to the decorator
                            
                                Pandas - Convert columns to new rows after groupby
                            
                                parent-child relationship query in simple_salesforce python, extracting from ordered dicts
                            
                                method object is not JSON serializable
                            
                                Python __dict__
                            
                                Installation of PyCairo on Windows
                            
                                Removing leading zeros from pandas.core.series.Series
                            
                                I want to know the sample bucket name in boto3
                            
                                Headless chrome with selenium, can only find ways to scroll non-headless
                            
                                how to get unique values in all columns in pandas data frame
                            
                                How does SQLAlchemy create_engine import Engine class?
                            
                                pandas corr and corrwith very slow
                            
                                Scrapy - simple captcha solving example
                            
                                How to detect the type of widget?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read BigQuery table using python pipeline code in GCP Dataflow

Tags:

python

google-cloud-dataflow

gcp

Aditya Dixit

People also ask

1 Answers

Robbe

Recent Activity

Donate For Us