BigQuery: join on clusters fields

I'm trying to optimize our BigQuery model by using clustered tables.

I'm testing these scenarios:

Clustered table join clustered table (join by clusters fields)
Not Clustered table join Not clustered table (join by the same fields that scenario 1)

Without apply any where condition scenario 1 and 2 have equal cost(time and bytes processed). When I apply a condition by a clustered field 1 is 4x time faster and cheaper.

Clustered fields are only useful when you use a condition in the query? and not in a join? In this case, if I perform a join without any condition, the performance is the same with or without the clusters

How can I improve a join between two tables in BigQuery?

EDIT 2021-05-31

Add query execution plan of both jobs:

Clustered
Non-clustered

Can you do joins in BigQuery?

Google BigQuery does not support other join types, such as a full outer join or right outer join. In addition, Google BigQuery uses the default equals (=) operator to compare columns and does not support other operators.

Can we do clustering without partitioning in BigQuery?

You can create a clustered table by querying either a partitioned table or a non-partitioned table. You cannot change an existing table to a clustered table by using query results.

What is the difference between partitioning and clustering in BigQuery?

Partition pruning is done before the query runs, so you can get the query cost after partitioning pruning through a dry run. Cluster pruning is done when the query runs, so the cost is known only after the query finishes. You need partition-level management.

From docs, I would say that the cluster will be simply ignore as you are comparing using another column, during the Join.

Now, for optimizing a join, you could try to reduce data before the join. For instance, try filtering the tables, or pre-aggregating them to reduce as much data as you can. Finally, also take care of the order of the tables on the join. Order them from the biggest to the smallest one.

BigQuery: join on clusters fields

Tags:

performance

sql

google-bigquery

Leandro Abad

People also ask

Video Answer

1 Answers

Miller Andrés Trujillo Achury

Recent Activity

Donate For Us

BigQuery: join on clusters fields

Tags:

performance

sql

google-bigquery

Leandro Abad

People also ask

Video Answer

1 Answers

Miller Andrés Trujillo Achury

Related questions

Recent Activity

Donate For Us