Please check my understanding of REPEATED field in the following examples:
{ "title": "History of Alphabet", "author": [ { "name": "Larry" }, ] }
This JSON has schema:
[ { "name": "title", "type": "STRING" }, { "name": "author", "type": "RECORD", "fields": [ { "name": "name", "type": "STRING" } ] } ]
But the following JSON
{ "title": "History of Alphabet", "author": ["Larry", "Steve", "Eric"] }
has schema:
[ { "name": "title", "type": "STRING" }, { "name": "author", "type": "STRING", "mode": "REPEATED" } ]
Is this correct?
nb: I tried to go through the documentation, but can't find any explanation about this.
How to Query BigQuery Repeated Fields. To extract information from a repeated field in BigQuery, you must use a more exotic pattern. This is normally done using the UNNEST function, which converts an array of values in a table into rows. These can then be joined to the original table to be queried.
A repeated column is a column that can contain multiple values per row. For example, the column [Cities lived] in the data table below lists every city that the person has lived in. It could be just one city, or it could be many different places. In Spotfire, data tables with repeated columns are flattened.
BigQuery uses the insertId property for de-duplication. Hope this helps!
However, despite its unique advantages and powerful features, BigQuery is not a silver bullet. It is not recommended to use it on data that changes too often and, due to its storage location bound to Google's own services and processing limitations it's best not to use it as a primary data storage.
Close. In your first example, author
is an array of objects, which corresponds to a repeated record in BQ. So the schema would be:
[ { "name": "title", "type": "STRING" }, { "name": "author", "type": "RECORD", "mode": "REPEATED", <--- NOTE! "fields": [ { "name": "name", "type": "STRING" } ] } ]
Your second data/schema pair looks good (but note that the overall schema is an array, not an object, and it needs commas between elements).
There is some discussion of nested and repeated fields here: https://cloud.google.com/bigquery/docs/data?hl=en#nested
There are also some sample JSON data objects here: https://cloud.google.com/bigquery/preparing-data-for-bigquery#dataformats
But I agree we don't do a good job of explaining how those objects map to BQ schemas. Sorry about that!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With