I need to select multiple columns as part of a LEAD statement. This looks like it will be really inefficient, tripling the number of sorts and partitions required ->
SELECT
field,
field2,
field3,
LEAD(field, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField,
LEAD(field2, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField2,
LEAD(field3, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField3,
FROM dataset.table
Window functions, or Analytic functions as they're called in BigQuery, are a way to compute values over a group of rows. They return a single value for each row, in contrast to aggregate functions which returns a single value for a group of rows.
Just like the standard tables, you can create and use partitioned tables in BigQuery and your charges will be based on the data stored in the partitions and the queries you run against them.
When you create a table partitioned by ingestion time, BigQuery automatically assigns rows to partitions based on the time when BigQuery ingests the data. You can choose hourly, daily, monthly, or yearly granularity for the partitions. Partitions boundaries are based on UTC time.
Couple of points to add to Mikhail's answer:
Yes, BigQuery optimizes it - if window frame is the same, it will be set up only once and multiple functions will run over it.
You are right that it is tedious to write same frame over and over again, therefore we worked on improving BigQuery SQL dialect to make it more standard compliant, and in the near future* you will be able to write
SELECT
field,
field2,
field3,
LEAD(field, 1) OVER w1 AS nextField,
LEAD(field2, 1) OVER w1 AS nextField2,
LEAD(field3, 1) OVER w1 AS nextField3,
FROM dataset.table
WINDOW w1 AS (PARTITION BY field ORDER BY field ASC)
*Cannot really give you firm date, but this is in internal testing right now, so should not be too long.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With