Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using BigQueryInsertJobOperator instead of BigQueryExecuteQueryOperator

Recently updated my Airflow and saw BigQueryExecuteQueryOperator has been deprecated in favor of BigQueryInsertJobOperator. The documentation seemed rather vague, linking to REST Resource: jobs (and Method: jobs.query). In particular, it's unclear to me whether there's anywhere I can specify write_disposition, destination_dataset_table, etc. I want to make sure I'm not making things overly complicated.

Where I currently do

# my.sql
SELECT * FROM `proj.ds.table_1`
---------------------------------
# my-dag.py
BigQueryExecuteQueryOperator(
    task_id='copy-table-1',
    sql = 'my.sql',
    destination_dataset_table='proj:ds.table_2',
    write_disposition='WRITE_EMPTY',
    dag=dag
)

do I now need to use DDL statements like

# my.sql
CREATE TABLE IF NOT EXISTS
ds.table_2
AS (
  SELECT * FROM `proj.ds.table_1`
)
---------------------------------
# my-dag.py
BigQueryInsertJobOperator(
    task_id='copy-table-1',
    configuration={
        'query': my.sql
    },
    dag=dag
)

Not to mention passing parameters to the query in the format suggested by the QueryParameter docs instead of just params={ 'table': THE_TABLE }...

like image 683
zack Avatar asked Mar 13 '26 20:03

zack


1 Answers

This is the API documentation to follow for the BigQueryInsertJobOperator: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationquery.

Rather than mess with Google's parameterized SQL queries, I'm using this method to get templated queries to write to BQ:

# my.sql
SELECT * FROM `{PROJECT}.{DATASET}.{TBL_TO_MOVE}`
---------------------------------
# my-dag.py
PROJECT = 'my-project'
DATASET = 'my-dataset'
TBL_TO_MOVE = 'some-table'

DESTINATION_DS = 'other-dataset'
DESTINATION_TBL = 'other-table'

BigQueryInsertJobOperator(
    task_id='copy-table-1',
    configuration={
        'query': {
            'query': open('my.sql', 'r').read().format(**locals()),
            'destinationTable': {
                'projectId': PROJECT,
                'datasetId': DESTINATION_DS,
                'tableId': DESTINATION_TBL
            },
            'useLegacySql': False,
            'allowLargeResults': True,
        }
    },
    dag=dag
)

The open('my.sql', ...) reads the SQL file then uses local variables in place of the brace variables (e.g. {PROJECT} gets replaced by my-project).

like image 134
zack Avatar answered Mar 15 '26 09:03

zack



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!