Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-use BigQuery window function partition

I need to select multiple columns as part of a LEAD statement. This looks like it will be really inefficient, tripling the number of sorts and partitions required ->

SELECT 
    field,
    field2,
    field3,
    LEAD(field, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField,
    LEAD(field2, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField2,
    LEAD(field3, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField3,
FROM dataset.table
  • Is there a better way to do this?
  • Does BigQuery optimize for this at query runtime to make it efficient?
like image 316
Charles Offenbacher Avatar asked Feb 19 '16 19:02

Charles Offenbacher


People also ask

Does BigQuery support window functions?

Window functions, or Analytic functions as they're called in BigQuery, are a way to compute values over a group of rows. They return a single value for each row, in contrast to aggregate functions which returns a single value for a group of rows.

Can you partition an existing table BigQuery?

Just like the standard tables, you can create and use partitioned tables in BigQuery and your charges will be based on the data stored in the partitions and the queries you run against them.

What does partition by do in BigQuery?

When you create a table partitioned by ingestion time, BigQuery automatically assigns rows to partitions based on the time when BigQuery ingests the data. You can choose hourly, daily, monthly, or yearly granularity for the partitions. Partitions boundaries are based on UTC time.


1 Answers

Couple of points to add to Mikhail's answer:

  1. Yes, BigQuery optimizes it - if window frame is the same, it will be set up only once and multiple functions will run over it.

  2. You are right that it is tedious to write same frame over and over again, therefore we worked on improving BigQuery SQL dialect to make it more standard compliant, and in the near future* you will be able to write


SELECT 
    field,
    field2,
    field3,
    LEAD(field, 1) OVER w1 AS nextField,
    LEAD(field2, 1) OVER w1 AS nextField2,
    LEAD(field3, 1) OVER w1 AS nextField3,
FROM dataset.table
WINDOW w1 AS (PARTITION BY field ORDER BY field ASC)

*Cannot really give you firm date, but this is in internal testing right now, so should not be too long.

like image 99
Mosha Pasumansky Avatar answered Oct 14 '22 08:10

Mosha Pasumansky