Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying a Partitioned table in BigQuery using a reference from a joined table

Tags:

I would like to run a query that partitions table A using a value from table B. For example:

#standard SQL
select A.user_id
from my_project.xxx A
inner join my_project.yyy B
on A._partitiontime = timestamp(B.date)
where B.date = '2018-01-01'

This query will scan all the partitions in table A and will not take into consideration the date I specified in the where clause (for partitioning purposes). I have tried running this query in several different ways but all produced the same result - scanning all partitions in table A. Is there any way around it?

Thanks in advance.

like image 316
user9984628 Avatar asked Jul 31 '18 10:07

user9984628


People also ask

How would you query specific partitions in a BigQuery table?

Query Specific Partitions when you create a table partitioned by according to a TIMESTAMP or DATE column. Tables partitioned according to a TIMESTAMP or DATE column do not have pseudo-columns! To limit the number of partitions analyzed when querying partitioned tables, you can use a predicate filter (WHERE clause).

How does partitioning a table in BigQuery impacts performance and cost?

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.


2 Answers

With BigQuery scripting (Beta now), there is a way to prune the partitions.

Basically, a scripting variable is defined to capture the dynamic part of a subquery. Then in subsequent query, scripting variable is used as a filter to prune the partitions to be scanned.

DECLARE date_filter ARRAY<DATETIME> 
  DEFAULT (SELECT ARRAY_AGG(date) FROM B WHERE ...);

select A.user_id
from my_project.xxx A
inner join my_project.yyy B
on A._partitiontime = timestamp(B.date)
where A._partitiontime IN UNNEST(date_filter)
like image 152
Yun Zhang Avatar answered Sep 28 '22 18:09

Yun Zhang


The doc says this about your use case:

Express the predicate filter as closely as possible to the table identifier. Complex queries that require the evaluation of multiple stages of a query in order to resolve the predicate (such as inner queries or subqueries) will not prune partitions from the query.

The following query does not prune partitions (note the use of a subquery):

#standardSQL
SELECT
  t1.name,
  t2.category
FROM
  table1 t1
INNER JOIN
  table2 t2
ON
  t1.id_field = t2.field2
WHERE
  t1.ts = (SELECT timestamp from table3 where key = 2)
like image 37
Pentium10 Avatar answered Sep 28 '22 19:09

Pentium10