Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a column of UUIDs in Google BigQuery

Google BigQuery doesn't support UUID as data type. So, which option is better to store it:

  • STRING: String with the format 8-4-4-4-12
  • BYTES: Array of 16 bytes (128 bits)
like image 565
tashuhka Avatar asked Mar 21 '18 11:03

tashuhka


People also ask

What is UUID in BigQuery?

Description. Returns a random universally unique identifier (UUID) as a STRING . The returned STRING consists of 32 hexadecimal digits in five groups separated by hyphens in the form 8-4-4-4-12. The hexadecimal digits represent 122 random bits and 6 fixed bits, in compliance with RFC 4122 section 4.4.

How do I create a dynamic table in BigQuery?

A BigQuery job can create the destination table if not exist. But to create dynamic tables you'll need to give the destination table a different name every day. Or you can use the partitioned tables, then the destination table name can remain the same, and a new partition will be automatically created every day.


1 Answers

Edit: BigQuery now supports a function called GENERATE_UUID. This returns a STRING with 32 hexadecimal digits in five groups separated by hyphens in the form 8-4-4-4-12.

Original content:

Some discussion of the tradeoffs:

Using STRING

  • UUIDs are compatible with the representation in other systems, such as if you export to CSV and then want to merge with exports from elsewhere.
  • UUIDs are compatible with BigQuery's probably UUID implementation. You will be able to generate UUIDs of this same form using a function (when the feature is implemented).
  • If you decide to represent the UUIDs as BYTES later, you can potentially convert using a UDF.
  • Downside: Comparisons may not be as fast as with BYTES depending on the operator, since string comparisons have to take UTF-8 encoding into account. (It sounds like this isn't an issue for you).
  • Downside: Storage costs are higher. (It sounds like this isn't an issue for you).

Using BYTES

  • UUIDs are stored more compactly; storage is cheaper and comparisons are faster.
  • If you decide to represent the UUIDs as STRINGs later, you can potentially convert them using a UDF.
  • Downside: UUIDs are not compatible with other systems after export, and will likely not be compatible with BigQuery's implementation either.
like image 138
Elliott Brossard Avatar answered Sep 18 '22 00:09

Elliott Brossard