How can I assign surrogate keys when inserting records in a BigQuery table? Something like using Sequence to generate unique values or NextVal ?
Google BigQuery has no primary key or unique constraints.
Description. Returns a random universally unique identifier (UUID) as a STRING . The returned STRING consists of 32 hexadecimal digits in five groups separated by hyphens in the form 8-4-4-4-12. The hexadecimal digits represent 122 random bits and 6 fixed bits, in compliance with RFC 4122 section 4.4.
However, Google BigQuery does not support Primary Key and Foreign Key constraints.
BigQuery Regexp Functions Regular expressions are a pattern or a sequence of characters that allows you to match, search and replace or validate a string input.
If you are looking to generate surrogate key values in BigQuery then it is best to avoid the ROW_NUMBER OVER () option and its variants. To quote the BigQuery post about surrogate keys:
To implement ROW_NUMBER(), BigQuery needs to sort values at the root node of the execution tree, which is limited by the amount of memory in one execution node.
This will always cause you to run into issues when you have even a mild amount of records.
There are two alternatives:
Option 1 - GENERATE_UUID()
Since a surrogate key has no business meaning and is just a unique key generated to be used in the data warehouse you can simply generate them using the GENERATE_UUID()
function call in BigQuery. This gives you a universally unique UUID which you can use as a surrogate key value.
One drawback is that this key will be 32 bites instead of an 8 byte INT64 value. So if you have a massive amount of records this could increase the storage size of your data.
Option 2 - Generate a unique Hash
The second option is to use a hash function to generate a unique has. This is a little more involved as you would need to find the combination of columns and or random other input to make sure you can never generate the same value twice.
Some hash functions will also output a 32 byte value so you wont save on storage but the FARM_FINGERPRINT() hash function will output an INT64 value which could save some storage. You could thus utilise options 1 and option 2 to generate a unique integer surrogate key by doing the following:
FARM_FINGERPRINT(GENERATE_UUID())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With