Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery complains after one too many UNION ALL operations - why does it happen and what are my options?

Tags:

I'm developing what is turning out to be quite a complex query that requires me to stack data (i.e UNION ALL) many times. To my surprise BigQuery doesn't like the stacking and the dry run shows exception:

Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.

I've isolated the point in the query where the problem arises to confirm that it appears to be one too many UNION ALL causing the problem. I'm surprised that the UNION ALL would do this, but I suspect I'm naive in my thinking here.

  1. Why isn't BigQuery able to handle this additional UNION ALL? Isn't stacking data one of the more straight forward operations?

  2. What are my options to achieve the same result? Is there an operation that I'm not aware of that could do the same job or an alternative method?

Here's the query in full, although I should note that project.dataset.source_view does do some relatively straight forward processing first:

WITH p0_funnel AS (
  SELECT  
    date, 
    platform_type, 
    platform, 
    flow,
    step_1, 
    step_2, 
    step_3, 
    step_4, 
    step_5, 
    step_6
  FROM `project.dataset.source_view`
), p1_funnel AS (
  SELECT
    date,
    flow,
    platform_type,
    platform,
    SUM(step_1) AS step_1, 
    SUM(step_2) AS step_2, 
    SUM(step_3) AS step_3, 
    SUM(step_4) AS step_4, 
    SUM(step_5) AS step_5, 
    SUM(step_6) AS step_6
  FROM p0_funnel
  GROUP BY 
    date, 
    flow,
    platform_type,
    platform
), p2_funnel AS (
  SELECT
    date,
    flow,
    platform,
    platform_type,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6
  FROM p1_funnel
), p3_funnel AS (
  SELECT
    date, platform, platform_type, flow,
    'step_1' AS step,
    step_1 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_2' AS step,
    step_2 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_3' AS step,
    step_3 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_4' AS step,
    step_4 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_5' AS step,
    step_5 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_6' AS step,
    step_6 AS step_sessions
  FROM p1_funnel
), p4_funnel AS (
  SELECT
    main.date,
    platform, platform_type, flow,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p3_funnel AS main
  JOIN p2_funnel USING(date, platform, platform_type, flow)

), funnel_platform_type AS (
  SELECT
    date,
    'platform_type' AS dimension,
    platform_type AS value,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p4_funnel
), funnel_platform AS (
  SELECT
    date,
    'platform' AS dimension,
    platform AS value,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p4_funnel
), funnel_flow AS (
  SELECT
    date,
    'flow' AS dimension,
    flow AS value,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p4_funnel
), p5_funnel AS (
  SELECT * FROM funnel_platform_type UNION ALL
  SELECT * FROM funnel_platform UNION ALL
  SELECT * FROM funnel_flow # including this UNION ALL first introduces the problem
)

SELECT
  date,
  dimension,
  ROW_NUMBER() OVER (PARTITION BY dimension, step ORDER BY step_1 DESC) AS dim_order,
  value,
  step,
  CASE
    WHEN step = 'step_1' THEN 1
    WHEN step = 'step_2' THEN 2
    WHEN step = 'step_3' THEN 3
    WHEN step = 'step_4' THEN 4
    WHEN step = 'step_5' THEN 5
    WHEN step = 'step_6' THEN 6
    ELSE null
  END AS step_order,  
  CASE
    WHEN step = 'step_1' THEN step_2
    WHEN step = 'step_2' THEN step_3
    WHEN step = 'step_3' THEN step_4
    WHEN step = 'step_4' THEN step_5
    WHEN step = 'step_5' THEN step_6
    WHEN step = 'step_6' THEN null
    ELSE null
  END AS next_step_sessions,
  step_1,
  step_2,
  step_3,
  step_4,
  step_5,
  step_6,
  step_sessions
FROM p5_funnel
like image 459
goose Avatar asked Dec 18 '18 17:12

goose


People also ask

How does Union all work in BigQuery?

What does UNION mean in BigQuery? Combining the results of two or more queries in a vertical manner by consolidating or unifying the columns from the result sets of each query is what UNION in BigQuery is all about. If you've ever dealt with UNION queries in SQL, you'll be familiar with BigQuery UNION queries.

Does union work in BigQuery?

Since each of the tables contain the same columns and in the same order, we don't need to specify anything extra in either the SELECT clause nor the filter options that follow, and yet BigQuery is intelligent enough to translate this query into a UNION ALL to combine all the results into one dataset.

What is the main reason to avoid using the SELECT in a BigQuery query?

1. Avoid SELECT* When you run a query using a SELECT *, BigQuery has to read ALL the storage volumes. Whereas if you query only certain columns using a SELECT col1, col2, col3… then BigQuery only needs to retrieve data for the selected columns.


2 Answers

It is suggested to use temporary tables instead of a lot of WITH clauses, etc.. Breaking the query into a few simpler queries, and persisting the intermediate results into short-term tables or temporary tables should help to resolve this error.

The WITH clause contains one or more named subqueries which execute every time a subsequent SELECT statement references them. Any clause or subquery can reference subqueries you define in the WITH clause. This includes any SELECT statements on either side of a set operator, such as UNION.

The WITH clause is useful primarily for readability, because BigQuery does not materialize the result of the queries inside the WITH clause. If a query appears in more than one WITH clause, it executes in each clause.

like image 99
Sridhar Pothamsetti Avatar answered Oct 11 '22 18:10

Sridhar Pothamsetti


This happened because of the BigQuery subqueries quota. Not because of "UNION All". I faced the same problem when trying to execute a query with more than 125+ subqueries. So try to partition your query by subqueries count and insert every part in a temp table, then collect data from temp tables and drop them when you will finish.  

like image 21
Алексей Фастовец Avatar answered Oct 11 '22 18:10

Алексей Фастовец