I need to create a simple ETL pipeline in Python using Pandas/pandas-gbq, reading each day between a given date range from BigQuery into a Pandas dataframe and create separate daily tables from query result (writing back to BigQuery).
Though there may be better and more efficient approaches (note: I'm not a software engineer), I'm currently looking into Parameterized Queries in BigQuery to parameterize date column and iterate this in a for-loop in Python.
Does anyone know if pandas-gbq currently supports Parameterized Queries? Thanks in advance.
The answer posted by Ben P needs some improvements with Pandas GBQ as we get the error below:
GenericGBQException: Reason: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/my-bigquery/jobs: Missing query parameter value
The important and missing configuration parameter is:
'parameterValue': {'value': 'TX'}
which can be checked in Running a query with a configuration docs sample.
Keeping the query config together we get:
query_config = {
'query': {
'parameterMode': 'NAMED',
'queryParameters': [
{
'name': 'state',
'parameterType': {'type': 'STRING'},
'parameterValue': {'value': 'TX'}
},
]
}
}
I think this will help most of us who are coming to get the solution to the problem.
Yes, it does. However, I would recommend you make the switch to the official Google BigQuery Client Library, which also supports parameters.
BigQuery Client Libraries: https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python
Setting parameters with Pandas GBQ
You can set parameters in an Pandas GBQ query using the configuration parameter, to quote from the Pandas GBQ docs:
configuration : dict, optional Query config parameters for job processing. For example:
configuration = {‘query’: {‘useQueryCache’: False}}
Here is a full code example from that link which describes how you can parameterize a query in Pandas GBQ:
import pandas
sql = """
SELECT name
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE state = @state
"""
query_config = {
'query': {
'parameterMode': 'NAMED',
'queryParameters': [
{
'name': 'state',
'parameterType': {'type': 'STRING'}
},
]
}
}
df = pandas.read_gbq(sql, configuration=query_config)
Setting parameters with the BigQuery Client Library
Here is an excellent article on migrating from Pandas-GBQ to the BigQuery Client Library: https://cloud.google.com/bigquery/docs/pandas-gbq-migration
And here is some sample Python code where I use parameters in a query using the official BQ Client Library:
table_name = "my_table"
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table(table_name)
job_config.destination = table_ref
job_config.write_disposition = 'WRITE_APPEND'
sql = """
SELECT * FROM dataset.table WHERE visit_date = date
"""
query_params = [bigquery.ScalarQueryParameter('date', 'DATE', date)]
job_config.query_parameters = query_params
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
location='EU',
job_config=job_config) # API request - starts the query
query_job.result() # Waits for the query to finish
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With