I'm developing what is turning out to be quite a complex query that requires me to stack data (i.e UNION ALL
) many times. To my surprise BigQuery doesn't like the stacking and the dry run shows exception:
Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.
I've isolated the point in the query where the problem arises to confirm that it appears to be one too many UNION ALL
causing the problem. I'm surprised that the UNION ALL
would do this, but I suspect I'm naive in my thinking here.
Why isn't BigQuery able to handle this additional UNION ALL
? Isn't stacking data one of the more straight forward operations?
What are my options to achieve the same result? Is there an operation that I'm not aware of that could do the same job or an alternative method?
Here's the query in full, although I should note that project.dataset.source_view
does do some relatively straight forward processing first:
WITH p0_funnel AS (
SELECT
date,
platform_type,
platform,
flow,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6
FROM `project.dataset.source_view`
), p1_funnel AS (
SELECT
date,
flow,
platform_type,
platform,
SUM(step_1) AS step_1,
SUM(step_2) AS step_2,
SUM(step_3) AS step_3,
SUM(step_4) AS step_4,
SUM(step_5) AS step_5,
SUM(step_6) AS step_6
FROM p0_funnel
GROUP BY
date,
flow,
platform_type,
platform
), p2_funnel AS (
SELECT
date,
flow,
platform,
platform_type,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6
FROM p1_funnel
), p3_funnel AS (
SELECT
date, platform, platform_type, flow,
'step_1' AS step,
step_1 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_2' AS step,
step_2 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_3' AS step,
step_3 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_4' AS step,
step_4 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_5' AS step,
step_5 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_6' AS step,
step_6 AS step_sessions
FROM p1_funnel
), p4_funnel AS (
SELECT
main.date,
platform, platform_type, flow,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p3_funnel AS main
JOIN p2_funnel USING(date, platform, platform_type, flow)
), funnel_platform_type AS (
SELECT
date,
'platform_type' AS dimension,
platform_type AS value,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p4_funnel
), funnel_platform AS (
SELECT
date,
'platform' AS dimension,
platform AS value,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p4_funnel
), funnel_flow AS (
SELECT
date,
'flow' AS dimension,
flow AS value,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p4_funnel
), p5_funnel AS (
SELECT * FROM funnel_platform_type UNION ALL
SELECT * FROM funnel_platform UNION ALL
SELECT * FROM funnel_flow # including this UNION ALL first introduces the problem
)
SELECT
date,
dimension,
ROW_NUMBER() OVER (PARTITION BY dimension, step ORDER BY step_1 DESC) AS dim_order,
value,
step,
CASE
WHEN step = 'step_1' THEN 1
WHEN step = 'step_2' THEN 2
WHEN step = 'step_3' THEN 3
WHEN step = 'step_4' THEN 4
WHEN step = 'step_5' THEN 5
WHEN step = 'step_6' THEN 6
ELSE null
END AS step_order,
CASE
WHEN step = 'step_1' THEN step_2
WHEN step = 'step_2' THEN step_3
WHEN step = 'step_3' THEN step_4
WHEN step = 'step_4' THEN step_5
WHEN step = 'step_5' THEN step_6
WHEN step = 'step_6' THEN null
ELSE null
END AS next_step_sessions,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p5_funnel
What does UNION mean in BigQuery? Combining the results of two or more queries in a vertical manner by consolidating or unifying the columns from the result sets of each query is what UNION in BigQuery is all about. If you've ever dealt with UNION queries in SQL, you'll be familiar with BigQuery UNION queries.
Since each of the tables contain the same columns and in the same order, we don't need to specify anything extra in either the SELECT clause nor the filter options that follow, and yet BigQuery is intelligent enough to translate this query into a UNION ALL to combine all the results into one dataset.
1. Avoid SELECT* When you run a query using a SELECT *, BigQuery has to read ALL the storage volumes. Whereas if you query only certain columns using a SELECT col1, col2, col3… then BigQuery only needs to retrieve data for the selected columns.
It is suggested to use temporary tables instead of a lot of WITH clauses, etc.. Breaking the query into a few simpler queries, and persisting the intermediate results into short-term tables or temporary tables should help to resolve this error.
The WITH clause contains one or more named subqueries which execute every time a subsequent SELECT statement references them. Any clause or subquery can reference subqueries you define in the WITH clause. This includes any SELECT statements on either side of a set operator, such as UNION.
The WITH clause is useful primarily for readability, because BigQuery does not materialize the result of the queries inside the WITH clause. If a query appears in more than one WITH clause, it executes in each clause.
This happened because of the BigQuery subqueries quota. Not because of "UNION All". I faced the same problem when trying to execute a query with more than 125+ subqueries. So try to partition your query by subqueries count and insert every part in a temp table, then collect data from temp tables and drop them when you will finish.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With