Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google BigQuery asking for JOIN EACH but I'm already using it

I'm trying to run a query in BigQuery which has two sub selects and a join, but I can't get it to work. What I'm doing as a workaround is to run the subselects by themselves, then saving them as tables, then doing another query with a join, but I think I should be able to do this with one query.

I'm getting the error:

Table too large for JOIN. Consider using JOIN EACH. For more details, please see https://developers.google.com/bigquery/docs/query-reference#joins

but I'm already using a join each. I've tried using a cross join and using group by each but those give me different errors. The other questions on Stack Overflow about this topic don't help, one says it was a bug in BigQuery and the other was somebody using 'cross join each'...

Below is my sql, forgive me if it's full of errors but I think it should work:

select
t1.device_uuid,
t1.session_uuid,
t1.nth,
t1.Diamonds_Launch,
t2.Diamonds_Close
from (
    select
    device_uuid,
    session_uuid,
    nth,
    sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Launch
    from [project_id].[table_id]
    where name = 'App Launch'
    and attributes.Name = 'Inventory - Diamonds'
    group by device_uuid, session_uuid, nth
    ) as t1
join each (
    select
    device_uuid,
    session_uuid,
    nth,
    sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Close
    from [project_id].[table_id]
    where name = 'App Close'
    and attributes.Name = 'Inventory - Diamonds'
    group by device_uuid, session_uuid, nth
    ) as t2
on t1.device_uuid = t2.device_uuid
and t1.session_uuid = t2.session_uuid
like image 784
Davidjb Avatar asked Feb 24 '15 19:02

Davidjb


2 Answers

You've got a GROUP BY inside a JOIN EACH. GROUP BY hits limits with cardinality (the number of distinct values) and the final grouping is not parallelizable. This limits BigQuery's ability to do the join.

If you change the GROUP BY to GROUP EACH BY, this will most likely work.

(yes, I realize that this is unpleasant and non-standard. The BigQuery team is currently working hard on making things like this 'just work'.)

like image 192
Jordan Tigani Avatar answered Jan 02 '23 22:01

Jordan Tigani


This can be combined to one single query:

SELECT device_uuid,
       session_uuid,
       nth,
       SUM(IF (name = 'App Launch', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Launch,
       SUM(IF (name = 'App Close', INTEGER([project_id].[table_id].attributes.Value), 0)) AS Diamonds_Close,
FROM [project_id].[table_id]
WHERE attributes.Name = 'Inventory - Diamonds'
GROUP BY device_uuid,
         session_uuid,
         nth

You also have to use GROUP EACH for large tables.

like image 28
Pentium10 Avatar answered Jan 02 '23 22:01

Pentium10