I would like to run a query that partitions table A using a value from table B. For example:
#standard SQL
select A.user_id
from my_project.xxx A
inner join my_project.yyy B
on A._partitiontime = timestamp(B.date)
where B.date = '2018-01-01'
This query will scan all the partitions in table A and will not take into consideration the date I specified in the where clause (for partitioning purposes). I have tried running this query in several different ways but all produced the same result - scanning all partitions in table A. Is there any way around it?
Thanks in advance.
Query Specific Partitions when you create a table partitioned by according to a TIMESTAMP or DATE column. Tables partitioned according to a TIMESTAMP or DATE column do not have pseudo-columns! To limit the number of partitions analyzed when querying partitioned tables, you can use a predicate filter (WHERE clause).
A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.
With BigQuery scripting (Beta now), there is a way to prune the partitions.
Basically, a scripting variable is defined to capture the dynamic part of a subquery. Then in subsequent query, scripting variable is used as a filter to prune the partitions to be scanned.
DECLARE date_filter ARRAY<DATETIME>
DEFAULT (SELECT ARRAY_AGG(date) FROM B WHERE ...);
select A.user_id
from my_project.xxx A
inner join my_project.yyy B
on A._partitiontime = timestamp(B.date)
where A._partitiontime IN UNNEST(date_filter)
The doc says this about your use case:
Express the predicate filter as closely as possible to the table identifier. Complex queries that require the evaluation of multiple stages of a query in order to resolve the predicate (such as inner queries or subqueries) will not prune partitions from the query.
The following query does not prune partitions (note the use of a subquery):
#standardSQL
SELECT
t1.name,
t2.category
FROM
table1 t1
INNER JOIN
table2 t2
ON
t1.id_field = t2.field2
WHERE
t1.ts = (SELECT timestamp from table3 where key = 2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With