<p>Google BigQuery doesn't support UUID as data type. So, which option is better to store it:</p> <ul> <li> <code>STRING</code>: String with the format 8-4-4-4-12</li> <li> <code>BYTES</code>: Array of 16 bytes (128 bits)</li> </ul>

<p>Edit: BigQuery now supports a function called <code>GENERATE_UUID</code>. This returns a <code>STRING</code> with 32 hexadecimal digits in five groups separated by hyphens in the form 8-4-4-4-12.</p> <p>Original content:</p> <p>Some discussion of the tradeoffs:</p> <h3>Using <code>STRING</code> </h3> <ul> <li>UUIDs are compatible with the representation in other systems, such as if you export to CSV and then want to merge with exports from elsewhere.</li> <li>UUIDs are compatible with BigQuery's probably UUID implementation. You will be able to generate UUIDs of this same form using a function (when the feature is implemented).</li> <li>If you decide to represent the UUIDs as <code>BYTES</code> later, you can potentially convert using a UDF.</li> <li>Downside: Comparisons may not be as fast as with <code>BYTES</code> depending on the operator, since string comparisons have to take UTF-8 encoding into account. (It sounds like this isn't an issue for you).</li> <li>Downside: Storage costs are higher. (It sounds like this isn't an issue for you).</li> </ul> <h3>Using <code>BYTES</code> </h3> <ul> <li>UUIDs are stored more compactly; storage is cheaper and comparisons are faster.</li> <li>If you decide to represent the UUIDs as <code>STRING</code>s later, you can potentially convert them using a UDF.</li> <li>Downside: UUIDs are not compatible with other systems after export, and will likely not be compatible with BigQuery's implementation either.</li> </ul>

Create a column of UUIDs in Google BigQuery

1 Answers

Edit: BigQuery now supports a function called GENERATE_UUID. This returns a STRING with 32 hexadecimal digits in five groups separated by hyphens in the form 8-4-4-4-12.

Original content:

Some discussion of the tradeoffs:

Using `STRING`

UUIDs are compatible with the representation in other systems, such as if you export to CSV and then want to merge with exports from elsewhere.
UUIDs are compatible with BigQuery's probably UUID implementation. You will be able to generate UUIDs of this same form using a function (when the feature is implemented).
If you decide to represent the UUIDs as BYTES later, you can potentially convert using a UDF.
Downside: Comparisons may not be as fast as with BYTES depending on the operator, since string comparisons have to take UTF-8 encoding into account. (It sounds like this isn't an issue for you).
Downside: Storage costs are higher. (It sounds like this isn't an issue for you).

Using `BYTES`

UUIDs are stored more compactly; storage is cheaper and comparisons are faster.
If you decide to represent the UUIDs as STRINGs later, you can potentially convert them using a UDF.
Downside: UUIDs are not compatible with other systems after export, and will likely not be compatible with BigQuery's implementation either.

138

answered Sep 18 '22 00:09

Elliott Brossard

Related questions
                            
                                Are some bigquery public datasets no longer available?
                            
                                Airflow BigQueryOperator: how to save query result in a partitioned Table?
                            
                                Cannot query over table without a filter that can be used for partition elimination
                            
                                How to get intersection of two arrays in BigQuery
                            
                                I want a "materialized view" of the latest records
                            
                                BigQuery: Deleting Duplicates in Partitioned Table
                            
                                How to set permissions for specific dataset on Google BigQuery?
                            
                                Cannot Read Bigquery table sourced from Google Sheet (Oath / Scope Error)
                            
                                Accessing BigQuery with Google Spreadsheet
                            
                                Computing a moving maximum in BigQuery
                            
                                Google Big-query api 403-Forbidden Exception
                            
                                Google BigQuery asking for JOIN EACH but I'm already using it
                            
                                Wilcard on day table vs time partition
                            
                                Load a huge data from BigQuery to python/pandas/dask
                            
                                When I query a partitioned table, is it possible to filter by partition column with a subquery and reduce cost at the same time?
                            
                                I have daily tables on BigQuery. How to query the "newest" one?
                            
                                'TRIM' or 'PROPER' in BigQuery
                            
                                BigQuery: How to Avoid "Resources exceeded during query execution." error
                            
                                "bad double value" in Google BigQuery
                            
                                Does Bigquery support triggers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create a column of UUIDs in Google BigQuery

Tags:

uuid

google-bigquery

tashuhka

People also ask

1 Answers

Using `STRING`

Using `BYTES`

Elliott Brossard

Recent Activity

Donate For Us

Create a column of UUIDs in Google BigQuery

Tags:

uuid

google-bigquery

tashuhka

People also ask

1 Answers

Using STRING

Using BYTES

Elliott Brossard

Related questions

Recent Activity

Donate For Us

Using `STRING`

Using `BYTES`