Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does pandas-gbq currently support parameterised queries?

I need to create a simple ETL pipeline in Python using Pandas/pandas-gbq, reading each day between a given date range from BigQuery into a Pandas dataframe and create separate daily tables from query result (writing back to BigQuery).

Though there may be better and more efficient approaches (note: I'm not a software engineer), I'm currently looking into Parameterized Queries in BigQuery to parameterize date column and iterate this in a for-loop in Python.

Does anyone know if pandas-gbq currently supports Parameterized Queries? Thanks in advance.

like image 776
mrag Avatar asked Oct 31 '25 14:10

mrag


2 Answers

The answer posted by Ben P needs some improvements with Pandas GBQ as we get the error below:

GenericGBQException: Reason: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/my-bigquery/jobs: Missing query parameter value

The important and missing configuration parameter is:

'parameterValue': {'value': 'TX'}

which can be checked in Running a query with a configuration docs sample.

Keeping the query config together we get:

query_config = {
    'query': {
        'parameterMode': 'NAMED',
        'queryParameters': [
            {
                'name': 'state',
                'parameterType': {'type': 'STRING'},
                'parameterValue': {'value': 'TX'}
            },
        ]
    }
}

I think this will help most of us who are coming to get the solution to the problem.

like image 191
PaladiN Avatar answered Nov 02 '25 03:11

PaladiN


Yes, it does. However, I would recommend you make the switch to the official Google BigQuery Client Library, which also supports parameters.

BigQuery Client Libraries: https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python

Setting parameters with Pandas GBQ

You can set parameters in an Pandas GBQ query using the configuration parameter, to quote from the Pandas GBQ docs:

configuration : dict, optional Query config parameters for job processing. For example:

configuration = {‘query’: {‘useQueryCache’: False}}

Here is a full code example from that link which describes how you can parameterize a query in Pandas GBQ:

import pandas

sql = """
    SELECT name
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state = @state
"""
query_config = {
    'query': {
        'parameterMode': 'NAMED',
        'queryParameters': [
            {
                'name': 'state',
                'parameterType': {'type': 'STRING'}
            },
        ]
    }
}
df = pandas.read_gbq(sql, configuration=query_config)

Setting parameters with the BigQuery Client Library

Here is an excellent article on migrating from Pandas-GBQ to the BigQuery Client Library: https://cloud.google.com/bigquery/docs/pandas-gbq-migration

And here is some sample Python code where I use parameters in a query using the official BQ Client Library:

table_name = "my_table"
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table(table_name)
job_config.destination = table_ref
job_config.write_disposition = 'WRITE_APPEND'
sql = """
SELECT * FROM dataset.table WHERE visit_date = date
"""
query_params = [bigquery.ScalarQueryParameter('date', 'DATE', date)]
job_config.query_parameters = query_params

# Start the query, passing in the extra configuration.
query_job = client.query(
    sql,
    location='EU',
    job_config=job_config)  # API request - starts the query

query_job.result()  # Waits for the query to finish
like image 30
Ben P Avatar answered Nov 02 '25 03:11

Ben P