Is there any way to get row number for each record in BigQuery? (From the specs, I haven't seen anything about it) There is a NTH() function, but that applies to repeated fields.
There are some scenarios where row number is not necessary in BigQuery, such as the use of TOP() or LIMIT function. However, I need it to simulate some analytical functions, such as a cumulative sum(). For that purpose I need to identify each record with a sequential number. Any workaround on this?
Thanks in advance for your help!
Leo
ROW_NUMBER() function in BigQuery. Row_number is a Numbering function which is a subset of Analytic function in BigQuery. In the analytic function, the OVER clause should be included to define a window of rows within a query result set. For each selected window of rows, Row_number function assigns a unique number.
The row_number gives continuous numbers, while rank and dense_rank give the same rank for duplicates, but the next number in rank is as per continuous order so you will see a jump but in dense_rank doesn't have any gap in rankings.
Description. Returns the ordinal (1-based) rank of each row within the ordered partition. All peer rows receive the same rank value. The next row or set of peer rows receives a rank value which increments by the number of peers with the previous rank value, instead of DENSE_RANK , which always increments by 1.
BigQuery supports casting to NUMERIC.
2018 update: If all you want is a unique id for each row
#standardSQL SELECT GENERATE_UUID() uuid , * FROM table
2018 #standardSQL solution:
SELECT ROW_NUMBER() OVER() row_number, contributor_username, count FROM ( SELECT contributor_username, COUNT(*) count FROM `publicdata.samples.wikipedia` GROUP BY contributor_username ORDER BY COUNT DESC LIMIT 5)
But what about "Resources exceeded during query execution: The query could not be executed in the allotted memory. OVER() operator used too much memory.."
Ok, let's reproduce that error:
SELECT *, ROW_NUMBER() OVER() FROM `publicdata.samples.natality`
Yes - that happens because OVER() needs to fit all data into one VM - which you can solve with PARTITION:
SELECT *, ROW_NUMBER() OVER(PARTITION BY year, month) rn FROM `publicdata.samples.natality`
"But now many rows have the same row number and all I wanted was a different id for each row"
Ok, ok. Let's use partitions to give a row number to each row, and let's combine that row number with the partition fields to get an unique id per row:
SELECT * , FORMAT('%i-%i-%i', year, month, ROW_NUMBER() OVER(PARTITION BY year, month)) id FROM `publicdata.samples.natality`
The original 2013 solution:
Good news: BigQuery now has a row_number function.
Simple example:
SELECT [field], ROW_NUMBER() OVER() FROM [table] GROUP BY [field]
More complex, working example:
SELECT ROW_NUMBER() OVER() row_number, contributor_username, count, FROM ( SELECT contributor_username, COUNT(*) count, FROM [publicdata:samples.wikipedia] GROUP BY contributor_username ORDER BY COUNT DESC LIMIT 5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With