I'm trying to run a query in BigQuery which has two sub selects and a join, but I can't get it to work. What I'm doing as a workaround is to run the subselects by themselves, then saving them as tables, then doing another query with a join, but I think I should be able to do this with one query.
I'm getting the error:
Table too large for JOIN. Consider using JOIN EACH. For more details, please see https://developers.google.com/bigquery/docs/query-reference#joins
but I'm already using a join each. I've tried using a cross join and using group by each but those give me different errors. The other questions on Stack Overflow about this topic don't help, one says it was a bug in BigQuery and the other was somebody using 'cross join each'...
Below is my sql, forgive me if it's full of errors but I think it should work:
select
t1.device_uuid,
t1.session_uuid,
t1.nth,
t1.Diamonds_Launch,
t2.Diamonds_Close
from (
select
device_uuid,
session_uuid,
nth,
sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Launch
from [project_id].[table_id]
where name = 'App Launch'
and attributes.Name = 'Inventory - Diamonds'
group by device_uuid, session_uuid, nth
) as t1
join each (
select
device_uuid,
session_uuid,
nth,
sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Close
from [project_id].[table_id]
where name = 'App Close'
and attributes.Name = 'Inventory - Diamonds'
group by device_uuid, session_uuid, nth
) as t2
on t1.device_uuid = t2.device_uuid
and t1.session_uuid = t2.session_uuid
You've got a GROUP BY
inside a JOIN EACH
. GROUP BY
hits limits with cardinality (the number of distinct values) and the final grouping is not parallelizable. This limits BigQuery's ability to do the join.
If you change the GROUP BY
to GROUP EACH BY
, this will most likely work.
(yes, I realize that this is unpleasant and non-standard. The BigQuery team is currently working hard on making things like this 'just work'.)
This can be combined to one single query:
SELECT device_uuid,
session_uuid,
nth,
SUM(IF (name = 'App Launch', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Launch,
SUM(IF (name = 'App Close', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Close,
FROM [project_id].[table_id]
WHERE attributes.Name = 'Inventory - Diamonds'
GROUP BY device_uuid,
session_uuid,
nth
You also have to use GROUP EACH for large tables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With