Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery - using SQL UDF in join predicate

I'm trying to use a SQL UDF when running a left join, but get the following error:

Subquery in join predicate should only depend on exactly one join side.

Query is:

CREATE TEMPORARY FUNCTION game_match(game1 STRING,game2 STRING) AS (
  strpos(game1,game2) >0
);

SELECT 
  t1.gameId 
  FROM `bigquery-public-data.baseball.games_post_wide` t1
  left join `bigquery-public-data.baseball.games_post_wide` t2 on t1.gameId=t2.gameId and game_match(t1. gameId, t2.gameId)

When writing the condition inline, instead of the function call (strpos(t1. gameId, t2. gameId) >0), the query works.

Is there something problematic with this specific function, or is it that in general SQL UDF aren't supported in join predicate (for some reason)?

like image 632
Lior Avatar asked Feb 26 '19 08:02

Lior


1 Answers

You could file a feature request on the issue tracker to make this work. It's a limitation of query planning/optimization; for some background, BigQuery converts the function call so that the query's logical representation is like this:

SELECT 
  t1.gameId 
FROM `bigquery-public-data.baseball.games_post_wide` t1
left join `bigquery-public-data.baseball.games_post_wide` t2
on t1.gameId=t2.gameId
  and (SELECT strpos(game1,game2) > 0 FROM (SELECT t1.gameId AS game1, t2.gameId AS game2))

The reason that BigQuery transforms the SQL UDF call like this is that it needs to avoid computing the inputs more than once. While it's not an issue in this particular case, it makes a difference if you reference one of the inputs more than once in the UDF body, e.g. consider this UDF:

CREATE TEMP FUNCTION Foo(x FLOAT64) AS (x - x);
SELECT Foo(RAND());

If BigQuery were to inline the expression directly, you'd end up with this:

SELECT RAND() - RAND();

The result would not be zero, which is unexpected given the definition of the UDF.

In most cases, BigQuery's logical optimizations transform the more complicated subselect as shown above into a simpler form, assuming that doing so doesn't change the semantics of the query. That didn't happen in this case, though, hence the error.

like image 127
Elliott Brossard Avatar answered Nov 18 '22 22:11

Elliott Brossard