When I query a partitioned table, is it possible to filter by partition column with a subquery and reduce cost at the same time?

Tags:

google-bigquery

I can see from public documentation that BigQuery partition table has this limitation that if the partition column has a subquery as a filter, it won't prune the queried partition and reduce "bytes processed"(cost). I'm wondering if there is a way to workaround.

For example, this query will scan 38.67 GB, is there a way to reduce it?

WITH sub_query_that_generates_filter AS (
  SELECT DATE "2016-10-01" as month UNION ALL
  SELECT "2017-10-01" UNION ALL
  SELECT "2018-10-01"
)
SELECT block_hash, fee FROM `bigquery-public-data.crypto_bitcoin.transactions`
WHERE block_timestamp_month in 
(SELECT month FROM sub_query_that_generates_filter)

292

asked Oct 03 '19 20:10

Yun Zhang

1 Answers

With BigQuery scripting, there is a way to reduce the cost.

Basically, a scripting variable is defined to capture the dynamic part of a subquery. Then in subsequent query, scripting variable is used as a filter to prune the partitions to be scanned.

CREATE TEMP TABLE sub_query_that_generates_filter AS (
  SELECT DATE "2017-10-01" as month UNION ALL
  SELECT "2018-10-01" UNION ALL
  SELECT "2016-10-01" 
);
BEGIN
  DECLARE month_filter ARRAY<DATE> 
    DEFAULT (SELECT ARRAY_AGG(month) FROM sub_query_that_generates_filter);

  SELECT block_hash, fee FROM `bigquery-public-data.crypto_bitcoin.transactions` 
    WHERE block_timestamp_month in UNNEST(month_filter);
END

It scans only 2GB of data instead of 38GB. Cheaper and faster!

enter image description here

answered Sep 28 '22 08:09

Yun Zhang

Related questions
                            
                                How to use bigquery correlation based on many columns?
                            
                                How to scale Pivoting in BigQuery?
                            
                                SHA-256 BigQuery function or UDF
                            
                                How to change default Options in BigQuery console (Web UI), especially uncheck "Use Legacy SQL"?
                            
                                Bigquery: Partitioning data past 2000 limit (Update: Now 4000 limit) [duplicate]
                            
                                Convert Bigquery results to Pandas Data Frame
                            
                                Are some bigquery public datasets no longer available?
                            
                                Airflow BigQueryOperator: how to save query result in a partitioned Table?
                            
                                Cannot query over table without a filter that can be used for partition elimination
                            
                                How to get intersection of two arrays in BigQuery
                            
                                I want a "materialized view" of the latest records
                            
                                BigQuery: Deleting Duplicates in Partitioned Table
                            
                                How to set permissions for specific dataset on Google BigQuery?
                            
                                Cannot Read Bigquery table sourced from Google Sheet (Oath / Scope Error)
                            
                                Accessing BigQuery with Google Spreadsheet
                            
                                Computing a moving maximum in BigQuery
                            
                                Google Big-query api 403-Forbidden Exception
                            
                                Google BigQuery asking for JOIN EACH but I'm already using it
                            
                                Wilcard on day table vs time partition
                            
                                Load a huge data from BigQuery to python/pandas/dask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When I query a partitioned table, is it possible to filter by partition column with a subquery and reduce cost at the same time?

Tags:

google-bigquery

Yun Zhang

People also ask

1 Answers

Yun Zhang

Recent Activity

Donate For Us