I want to store an array of string values in a single column in BigQuery. In order to do that I need to first create a table with the proper schema. The schema identifier for the array is ARRAY<T>
where T
is the data type.
For example: T
replaced by INT64
as ARRAY<INT64>
enables the storage of 64-bit integer array in that column. How should I get this same effect but for storing string values? I have already tried STRING
, VARCHAR
and CHAR
.
Just mention: I am using latest google-cloud
python package
Documentation Reference for BigQuery Schema
To declare a specific data type for an array, use angle brackets ( < and > ). For example: SELECT ARRAY<FLOAT64>[1, 2, 3] as floats; Arrays of most data types, such as INT64 or STRING , don't require that you declare them first.
Alter statement allows us to modify the structure of the existing table. Using Alter Table Add Column statement, we can add one or more columns to the existing table in BigQuery.
In order to add a column with array data, you need to define that column's mode as REPEATED
. Hence a sample schema ends up being:
{
'name': 'array_of_strings',
'type': 'STRING',
'mode': 'REPEATED'
},{
'name': 'array_of_floats',
'type': 'FLOAT',
'mode': 'REPEATED'
}
This makes the field hold array values.
NOTE: You should be aware that if this is the schema of the table then you can not use CSV import functionality of big query since this is a limitation of CSV file format. You will need to use either json
or avro
formats.
Reference to the GitHub issue
In order to do that I need to first create a table with the proper schema.
just run below in Web UI with new destination table - to create needed schema
#standardSQL
SELECT ARRAY<STRING>[] AS array_of_strings
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With