Re-use BigQuery window function partition

Tags:

google-bigquery

I need to select multiple columns as part of a LEAD statement. This looks like it will be really inefficient, tripling the number of sorts and partitions required ->

SELECT 
    field,
    field2,
    field3,
    LEAD(field, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField,
    LEAD(field2, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField2,
    LEAD(field3, 1) OVER (PARTITION BY field ORDER BY field ASC) AS nextField3,
FROM dataset.table

Is there a better way to do this?
Does BigQuery optimize for this at query runtime to make it efficient?

316

asked Feb 19 '16 19:02

Charles Offenbacher

1 Answers

Couple of points to add to Mikhail's answer:

Yes, BigQuery optimizes it - if window frame is the same, it will be set up only once and multiple functions will run over it.
You are right that it is tedious to write same frame over and over again, therefore we worked on improving BigQuery SQL dialect to make it more standard compliant, and in the near future* you will be able to write


SELECT 
    field,
    field2,
    field3,
    LEAD(field, 1) OVER w1 AS nextField,
    LEAD(field2, 1) OVER w1 AS nextField2,
    LEAD(field3, 1) OVER w1 AS nextField3,
FROM dataset.table
WINDOW w1 AS (PARTITION BY field ORDER BY field ASC)

*Cannot really give you firm date, but this is in internal testing right now, so should not be too long.

answered Oct 14 '22 08:10

Mosha Pasumansky

Related questions
                            
                                Left join fails if not explicitly using ISNULL
                            
                                BigQuery connector for pyspark via Hadoop Input Format example
                            
                                How to retrieve huge (>2000) amount of entities from GAE datastore in under 1 second?
                            
                                How to import CSV to BigQuery using columns names from first row?
                            
                                Use of TABLE_DATE_RANGE function in Views
                            
                                Apache Beam in Dataflow Large Side Input
                            
                                Trigger cloud function when new data in BigQuery
                            
                                How to create dummy variable columns for thousands of categories in Google BigQuery?
                            
                                Python BigQuery allowLargeResults with pandas.io.gbq
                            
                                Creating a Bigquery table by Python API
                            
                                How to prevent query injection on Google Big Query
                            
                                Can one have hourly partitions in a BigQuery table?
                            
                                BigQuery - convert scientific notation to decimal format
                            
                                How to trigger On-Demand scheduled Query in Google Bigquery
                            
                                How to log out account in bq command in Bigquery
                            
                                Getting module 'google.protobuf.descriptor_pool' has no attribute 'Default' in my python script
                            
                                Bigquery console does not show all tables
                            
                                How to run a BigQuery query in Python
                            
                                What to try to get BigQuery to CAST BYTES to STRING?
                            
                                Serviceaccount does not have bigquery.jobs.create permission

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With