Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create a schema for a nested table - bigquery

I am trying to upload a data table for testing that contains multiple levels of nesting but I can not seem to get the syntax correct for specifying the schema.

Here is my current Schema file:

{
  "name":"city", "type":"RECORD",
    [
      {"name":"id", "type":"INTEGER"},
      {"name":"name", "type":"STRING"},
      {"name":"country", "type":"STRING"},
      {"name":"coord", "type":"RECORD"},
        [
          {"name":"lon", "type":"FLOAT"},
          {"name":"lat", "type":"FLOAT"}
        ],
    {"name":"time", "type":"TIMESTAMP"}
  ]
}

Here is a sample of the data:

{"city":{"id":1283240,"name":"Kathmandu","country":"NP","coord":{"lon":85.316666,"lat":27.716667}},"time":1394865171,"data":[{"dt":1394852400,"main":{"temp":296.15,"temp_min":293.866,"temp_max":296.15}},{"dt":1394863200,"main":{"temp":301.51,"temp_min":299.345,"temp_max":301.51}}]}

In the full file I have multiple City's, each with multiple "data" points per day.

Thanks

Mark

like image 468
Mark Olliver Avatar asked Mar 27 '26 13:03

Mark Olliver


1 Answers

When you have a RECORD type, you need to name the schema JSON array fields:. As in:

{
  "name":"city", "type":"RECORD", 
  "fields": [
      {"name":"id", "type":"INTEGER"},
      {"name":"name", "type":"STRING"},
      {"name":"country", "type":"STRING"},
      {"name":"coord", "type":"RECORD",
      "fields": [
          {"name":"lon", "type":"FLOAT"},
          {"name":"lat", "type":"FLOAT"}
        ]},
    {"name":"time", "type":"TIMESTAMP"}
  ]
}

There was also an issue that you had the } in the wrong place to close the inner schema.

One trick that I like to use is to use Python's json.loads() function to verify that I've actually created a valid JSON object, since sometimes it can be hard to figure out if you've got all of the commas you need and closed all of your quotes correctly. For example:

$ python
>>> import json
>>> schema = """
... <paste your initial schema>
... """
>>> json.loads(schema)

ValueError: Expecting property name: line 4 column 5 (char 41)

(it is complaining that you have an array element without a property name... you need "fields" here).

like image 63
Jordan Tigani Avatar answered Mar 29 '26 15:03

Jordan Tigani