Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery: How to Avoid "Resources exceeded during query execution." error

I'm wondering how I can avoid the "resources exceeded during execution" error. Most of the other questions on about this involve JOIN EACH or GROUP EACH BY, but I'm already not using those. If I include the WHERE clause on the date or the ABS(HASH(userId)), then the query works, but I'd like to have the entire data set available and then I'm going to filter it further in Tableau.

If I remove t4 the query works, but I want that last column, and I'll want even more columns created out of the event_parameters field for later queries.

Job ID was rhi-localytics-db:job_6MaesvuMK6mP6irmAnrcM9R3cx8 in case that helps, Thanks.

SELECT
    t1.userId as userId,
    t1.event_time AS event_time,
    t1.Diamond_Balance as Diamond_Balance,
    t2.Diamond_Change as Diamond_Change,
    t3.Gold_Balance as Gold_Balance,
    t4.Gold_Change as Gold_Change
FROM (
    SELECT
        userId,
        event_time,
        INTEGER(event_parameters.Value) AS Diamond_Balance,
    FROM
        FLATTEN([game_data], event_parameters)
    WHERE
        event_name LIKE 'Currency'
        AND event_parameters.Name = 'Diamond_Balance'
        -- and date(event_time) > '2015-09-11'
        -- AND ABS(HASH(userId) % 5)  = 0
    GROUP BY
        userId,
        event_time,
        Diamond_Balance ) AS t1
INNER JOIN (
    SELECT
        userId,
        event_time,
        INTEGER(event_parameters.Value) AS Diamond_Change,
    FROM
        FLATTEN([game_data], event_parameters)
    WHERE
        event_name LIKE 'Currency'
        AND event_parameters.Name = 'Diamond_Change'
        AND INTEGER(event_parameters.Value ) < 14000
        AND INTEGER(event_parameters.Value ) > -14000
        -- and date(event_time) > '2015-09-11'
        -- AND ABS(HASH(userId) % 5)  = 0

    GROUP BY
        userId,
        event_time,
        Diamond_Change ) AS t2
ON
    t1.userId = t2.userId
    AND t1.event_time = t2.event_time
INNER JOIN (
    SELECT
        userId,
        event_time,
        event_parameters.Value AS Gold_Balance,
    FROM
        FLATTEN([game_data], event_parameters)
    WHERE
        event_name LIKE 'Currency'
        AND event_parameters.Name = 'Gold_Balance'
        -- and date(event_time) > '2015-09-11'
        -- AND ABS(HASH(userId) % 5)  = 0

    GROUP BY
        userId,
        event_time,
        Gold_Balance ) AS t3
ON
    t1.userId = t3.userId
    AND t1.event_time = t3.event_time
INNER JOIN (
    SELECT
        userId,
        event_time,
        INTEGER(event_parameters.Value) AS Gold_Change,
    FROM
        FLATTEN([game_data], event_parameters)
    WHERE
        event_name LIKE 'Currency'
        AND event_parameters.Name = 'Gold_Change'
        -- and date(event_time) > '2015-09-11'
        -- AND ABS(HASH(userId) % 5)  = 0
    GROUP BY
        userId,
        event_time,
        Gold_Change ) AS t4
ON
    t1.userId = t4.userId
    AND t1.event_time = t4.event_time
like image 322
Davidjb Avatar asked Sep 22 '15 19:09

Davidjb


People also ask

What is the main reason to avoid using the select in a BigQuery query?

Avoid SELECT * Best practice: Query only the columns that you need. Using SELECT * is the most expensive way to query data. When you use SELECT * , BigQuery does a full scan of every column in the table. If you are experimenting with data or exploring data, use one of the data preview options instead of SELECT * .

How does slot affect the query execution in BigQuery?

Query execution under slot resource economyIf a query requests more slots than currently available, BigQuery queues up individual units of work and waits for slots to become available. As progress on query execution is made, and as slots free up, these queued up units of work get dynamically picked up for execution.

What is the best way to optimize BigQuery performance?

To further improve query performance, consider the benefits of purchasing more reserved slots, in addition to optimizing your data model and queries. BigQuery offers two pricing models for queries: on-demand pricing and flat-rate pricing. On-demand pricing is based on the amount of data processed by each query you run.


2 Answers

General advice on resources exceeded can be found here: https://stackoverflow.com/a/16579558/1375400

Note that adding EACH is generally the solution to, rather than the cause of, a resources exceeded error. (Though there are cases where it can work the other way around!)

Also, EACH is no longer meaningful on GROUP BY, and will shortly be irrelevant on JOIN.

like image 50
Jeremy Condit Avatar answered Sep 20 '22 00:09

Jeremy Condit


I think you should be able to do all your logic in just one simple "scan".
No joins at all!
Something like below. Just idea - but has somechances to work as is :)

SELECT
    userId,
    event_time,
    MAX(CASE WHEN event_parameters.Name = 'Diamond_Balance' 
            THEN INTEGER(event_parameters.Value) END) AS Diamond_Balance,
    MAX(CASE WHEN event_parameters.Name = 'Diamond_Change' AND INTEGER(event_parameters.Value ) BETWEEN -14000 AND 14000 
            THEN INTEGER(event_parameters.Value)) END AS Diamond_Change,
    MAX(CASE WHEN event_parameters.Name = 'Gold_Balance' 
            THEN INTEGER(event_parameters.Value) END) AS Gold_Balance,
    MAX(CASE WHEN event_parameters.Name = 'Gold_Change' 
            THEN INTEGER(event_parameters.Value) END) AS Gold_Change
FROM
    FLATTEN([game_data], event_parameters)
WHERE
    event_name LIKE 'Currency'
GROUP BY
    userId,
    event_time
like image 40
Mikhail Berlyant Avatar answered Sep 19 '22 00:09

Mikhail Berlyant