Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid correlated subqueries error in BigQuery

I have a simple query to obtain the currency rate in use at the time a transaction was created:

SELECT t.orderid, t.date, 
 (SELECT rate FROM sources.currency_rates r WHERE currencyid=1 AND 
r.date>=t.date  ORDER BY date LIMIT 1) rate
FROM sources.transactions t

This triggers an error:

Error: Correlated subqueries that reference other tables are not 
supported unless they can be de-correlated, such as by transforming 
them into an efficient JOIN.' 

I've tried with several types of joins and named subqueries, but none seem to work. What is the best way to accomplish this? Seems like a very common scenario that should be quite straightforward to implement in BQ's Standard Sql.

like image 490
Eric Escalante Avatar asked Jul 11 '17 16:07

Eric Escalante


People also ask

Should correlated subqueries be avoided?

While correlated subquery runs for each row returned by the outer query because the output of the whole query is based upon comparing the data returned by one row to all other rows of the table. That's why it is also very slow and generally avoided until you don't know any other way to solve the problem.

What is a correlated subquery and what are the possible issues with using it?

A correlated subquery is a subquery that refers to a column of a table that is not in its FROM clause. The column can be in the Projection clause or in the WHERE clause. In general, correlated subqueries diminish performance.

Is correlated subquery better than join?

The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

How many times correlated subquery will get executed?

A correlated SQL subquery is just a subquery that is executed many times—once for each record (row) returned by the outer (main) query. In other words, the outer query returns a table with multiple rows; the inner query then runs once for each of those rows.


1 Answers

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  t.orderid AS orderid, 
  t.date AS date, 
  ARRAY_AGG(r.rate ORDER BY r.date LIMIT 1)[SAFE_OFFSET(0)] AS rate
FROM `sources.transactions` AS t
JOIN `sources.currency_rates` AS r
ON currencyid = 1 
AND r.date >= t.date
GROUP BY orderid, date
like image 159
Mikhail Berlyant Avatar answered Oct 01 '22 20:10

Mikhail Berlyant