Row number in BigQuery?

Tags:

google-bigquery

Is there any way to get row number for each record in BigQuery? (From the specs, I haven't seen anything about it) There is a NTH() function, but that applies to repeated fields.

There are some scenarios where row number is not necessary in BigQuery, such as the use of TOP() or LIMIT function. However, I need it to simulate some analytical functions, such as a cumulative sum(). For that purpose I need to identify each record with a sequential number. Any workaround on this?

Thanks in advance for your help!

Leo

555

asked Jun 15 '12 19:06

Leo Stefa

1 Answers

2018 update: If all you want is a unique id for each row

#standardSQL SELECT GENERATE_UUID() uuid  , *  FROM table

2018 #standardSQL solution:

SELECT   ROW_NUMBER() OVER() row_number, contributor_username,   count FROM (   SELECT contributor_username, COUNT(*) count   FROM `publicdata.samples.wikipedia`   GROUP BY contributor_username   ORDER BY COUNT DESC   LIMIT 5)

But what about "Resources exceeded during query execution: The query could not be executed in the allotted memory. OVER() operator used too much memory.."

Ok, let's reproduce that error:

SELECT *, ROW_NUMBER() OVER()  FROM `publicdata.samples.natality`

Yes - that happens because OVER() needs to fit all data into one VM - which you can solve with PARTITION:

SELECT *, ROW_NUMBER() OVER(PARTITION BY year, month) rn  FROM `publicdata.samples.natality`

"But now many rows have the same row number and all I wanted was a different id for each row"

Ok, ok. Let's use partitions to give a row number to each row, and let's combine that row number with the partition fields to get an unique id per row:

SELECT *   , FORMAT('%i-%i-%i', year, month, ROW_NUMBER() OVER(PARTITION BY year, month)) id FROM `publicdata.samples.natality`

enter image description here

The original 2013 solution:

Good news: BigQuery now has a row_number function.

Simple example:

SELECT [field], ROW_NUMBER() OVER() FROM [table] GROUP BY [field]

More complex, working example:

SELECT   ROW_NUMBER() OVER() row_number,   contributor_username,   count, FROM (   SELECT contributor_username, COUNT(*) count,   FROM [publicdata:samples.wikipedia]   GROUP BY contributor_username   ORDER BY COUNT DESC   LIMIT 5)

answered Sep 28 '22 10:09

Felipe Hoffa

Related questions
                            
                                FLATTEN with TABLE_DATE_RANGE
                            
                                Can you use field alias with space in Google Big Query
                            
                                Using IF in BigQuery SQL
                            
                                Error: Not found: Dataset my-project-name:domain_public was not found in location US
                            
                                Does BigQuery support UPDATE, DELETE, and INSERT (SQL DML) statements?
                            
                                Google BigQuery, I lost null row when using 'unnest' function
                            
                                BigQuery StandardSQL: Last 7 Days using _TABLE_SUFFIX
                            
                                BigQuery: convert epoch to TIMESTAMP
                            
                                Querying multiple tables in Big Query
                            
                                How to manage schema migrations in Google BigQuery
                            
                                Create a table from query results in Google BigQuery
                            
                                WITH in BigQuery
                            
                                BigQuery - Check if table already exists
                            
                                Converting JSON into newline delimited JSON in Python
                            
                                Migrating from non-partitioned to Partitioned tables
                            
                                Populating a table with all dates in a given range in Google BigQuery
                            
                                What are the bigquery keyboard shortcuts?
                            
                                BigQuery SQL WHERE Date Between Current Date and -15 Days
                            
                                Running a python function in BigQuery
                            
                                BigQuery Date-Partitioned Views

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With