Is there something like an AUTO_INCREMENT, SERIAL, IDENTITY or sequence in BigQuery ?
I'm aware of ROW_NUMBER https://cloud.google.com/bigquery/query-reference#row-number
But I want to persist a generated unique ID for every row in my table.
Google BigQuery has no primary key or unique constraints.
Since a surrogate key has no business meaning and is just a unique key generated to be used in the data warehouse you can simply generate them using the GENERATE_UUID() function call in BigQuery. This gives you a universally unique UUID which you can use as a surrogate key value.
BigQuery does not support referential integrity (primary and foreign keys).
I would recommend superQuery. It's a full-fledged IDE for Google BigQuery with all the features you would like to find in modern IDE, web-based, including multi-tab, collaboration, git integration, cost controls, result visualizations, shareable boards, auto-complete and more!
BigQuery does not have a notion of row key generation at load time. You could rewrite the table with a query to generate arbitrary keys for your rows.
As you noted, ROW_NUMBER would give you a unique index for each row, but you may hit size limits for particularly large tables (since you'd need an unpartitioned window function over everything).
If you can tolerate a larger string key, you might consider generating a UUID for each row (which can be done randomly and doesn't require coordination with the rest of your data). If you're using Standard SQL (and you should!) the GENERATE_UUID() function will accomplish this.
In the linked answer, Felipe constructs a composite key, which may also work for you, if the combination of your keys is distinct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With