I am trying to run a query similar to:
Select Table1.a,Table1.b,Table2.c
From
(Select a, max(x) as b
from Tbl1
group by a
) as Table1
LEFT JOIN EACH Table2
ON
Join Condition
and I get "Table1 is an inline table, and so cannot be in outer part of an outer join."
when changing the Join from Outer to Inner, it works (hence no syntax issue...).
Can someone please explain this message, and what should i do differently to avoid it?
Thanks
TL;DR: Use GROUP EACH BY
in the inner query.
The inner query
Select a, max(x) as b from Tbl1 group by a
is not parallizable. You can compute partial results in parallel, but but in order to know the global max(x)
for each value of a
', you need to have that run in only one place. When you use JOIN EACH
in the outer query, you're instructing the query engine that you need to parallelize the JOIN
execution. However, you don't have a paralelizable source, so the query fails.
There are two ways to fix this: The first is to use an explicitly parallel version of the inner query -- just use GROUP EACH BY
instead of GROUP BY
. This can execute in parallel because the first step is to sort the underlying table by the a
field, so that you can compute the global maximum per a
field in a parallel worker. If the inner query is paralellizable, then the outer query can execute in parallel as well. This would look like:
Select Table1.a,Table1.b,Table2.c
From
(Select a, max(x) as b
from Tbl1
group EACH by a -- note the EACH keyword here
) as Table1
LEFT JOIN EACH Table2
ON
Join Condition
The second option is to use JOIN
instead of JOIN EACH
on the outer query. This might run into size limits depending on the size of Table2 (since to do a non-each join, the table on the right side needs to be 'small'.).
This is admittedly sub-optimal; you have to understand how the query engine works in order to make your queries run. That said, we (on the bigquery and dremel teams) are working hard on making queries 'just work' so that you don't have to deal with this kind of issue. We're making some progress on this, a number of these types of errors have been removed in the last couple of months (for example, an inner join would have failed too until recently), but we still have a ways to go.
Thanks for bringing this to our attention. I've filed an internal bug so that we can make this work better in the future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With