Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I generate unique key values for records in a BigQuery table?

How can I assign surrogate keys when inserting records in a BigQuery table? Something like using Sequence to generate unique values or NextVal ?

like image 928
FZF Avatar asked Nov 17 '15 19:11

FZF


People also ask

Do BigQuery tables have primary keys?

Google BigQuery has no primary key or unique constraints.

What is UUID in BigQuery?

Description. Returns a random universally unique identifier (UUID) as a STRING . The returned STRING consists of 32 hexadecimal digits in five groups separated by hyphens in the form 8-4-4-4-12. The hexadecimal digits represent 122 random bits and 6 fixed bits, in compliance with RFC 4122 section 4.4.

Does BigQuery have foreign key?

However, Google BigQuery does not support Primary Key and Foreign Key constraints.

What is regex in BigQuery?

BigQuery Regexp Functions Regular expressions are a pattern or a sequence of characters that allows you to match, search and replace or validate a string input.


1 Answers

If you are looking to generate surrogate key values in BigQuery then it is best to avoid the ROW_NUMBER OVER () option and its variants. To quote the BigQuery post about surrogate keys:

To implement ROW_NUMBER(), BigQuery needs to sort values at the root node of the execution tree, which is limited by the amount of memory in one execution node.

This will always cause you to run into issues when you have even a mild amount of records.

There are two alternatives:

Option 1 - GENERATE_UUID()

Since a surrogate key has no business meaning and is just a unique key generated to be used in the data warehouse you can simply generate them using the GENERATE_UUID() function call in BigQuery. This gives you a universally unique UUID which you can use as a surrogate key value.

One drawback is that this key will be 32 bites instead of an 8 byte INT64 value. So if you have a massive amount of records this could increase the storage size of your data.

Option 2 - Generate a unique Hash

The second option is to use a hash function to generate a unique has. This is a little more involved as you would need to find the combination of columns and or random other input to make sure you can never generate the same value twice.

Some hash functions will also output a 32 byte value so you wont save on storage but the FARM_FINGERPRINT() hash function will output an INT64 value which could save some storage. You could thus utilise options 1 and option 2 to generate a unique integer surrogate key by doing the following: FARM_FINGERPRINT(GENERATE_UUID())

like image 54
Twist Avatar answered Nov 13 '22 09:11

Twist