Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conditional join in bigquery

I have two tables.

Table 1 is a single COLUMN of integers.

Table 2 has three COLUMNS : start_integer, end_integer, data

The simple query is to join the column of integers with data where

  integer >= start_integer AND integer <= end_integer

In many SQL implementations this can be accomplished with a left conditional JOIN ... ON BETWEEN

SELECT tbl1.integer, tbl2.data FROM tbl1
LEFT JOIN tbl2 ON tbl1.integer BETWEEN tbl2.start_integer AND 
tbl2.end_integer;

But it seems BigQuery supports only JOIN ON with only an = condition.

This could be accomplished with a cross join, but BigQuery complains that my tables are too big. CROSS JOIN EACH is invalid.

How can I accomplish this join task within the limitations of BigQuery's SQL?

Below is my BigQuery SQL:

SELECT tbl1.integer, tbl2.data
FROM bq:data.tbl1 
CROSS JOIN bq:data.tbl2
WHERE tbl1.integer BETWEEN tbl2.start_integer AND tbl2.end_integer;

Which returns the error:

Error: 4.1 - 4.132: The JOIN operator's right-side table must be a small table. Switch the tables if the left-side table is smaller, or use JOIN EACH if both tables are larger than the maximum described at http://goo.gl/wXqgHs.

like image 994
Brett Bonner Avatar asked Jul 30 '15 08:07

Brett Bonner


People also ask

Can you do joins in BigQuery?

Google BigQuery does not support other join types, such as a full outer join or right outer join. In addition, Google BigQuery uses the default equals (=) operator to compare columns and does not support other operators.

What is cross join in BigQuery?

Cross joins are queries where each row from the first table is joined to every row in the second table (there are non-unique keys on both sides). The worst case output is the number of rows in the left table multiplied by the number of rows in the right table. In extreme cases, the query might not finish.

What is coalesce in BigQuery?

COALESCE(expr[, ... ]) Description. Returns the value of the first non-null expression. The remaining expressions are not evaluated.

WHAT IS WITH clause in BigQuery?

The WITH clause contains one or more common table expressions (CTEs). Each CTE binds the results of a subquery to a table name, which can be used elsewhere in the same query expression. BigQuery does not materialize the results of non-recursive CTEs within the WITH clause.


2 Answers

Good news (2016)! BigQuery does support inequality joins now - make sure to uncheck the "use legacy SQL option".

Example query:

SELECT * 
FROM (
  SELECT 1 x
) a JOIN (
  SELECT 2 y
) b
ON a.x<b.y

With legacy SQL:

Error: ON clause must be AND of = comparisons of one field name from each table, ...

With standard SQL:

1     2

enter image description here

  • Docs: https://cloud.google.com/bigquery/sql-reference/enabling-standard-sql
  • Discussion: https://code.google.com/p/google-bigquery/issues/detail?id=448#c31
like image 116
Felipe Hoffa Avatar answered Oct 01 '22 18:10

Felipe Hoffa


BigQuery does not support cross joins on right-side tables.

like image 29
Brett Bonner Avatar answered Oct 01 '22 18:10

Brett Bonner