I have a newline-delimited JSON file. Is it possible to generate a schema using a tool like jq? I've had some success with jq in the past but haven't done something as complicated as this.
Here's the format of the schema I'm aiming for: https://cloud.google.com/bigquery/docs/nested-repeated#example_schema. Notice that nesting is handled with a fields key of the parent, and arrays are handled with "mode": "repeated". (Any help with some sort of schema is greatly appreciated and I then can massage into this format).
Copying from the link above, I'd like to generate from this:
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
...to...
[
{
"name": "id",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "first_name",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "last_name",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "dob",
"type": "DATE",
"mode": "NULLABLE"
},
{
"name": "addresses",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "status",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "address",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "city",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "state",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "zip",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "numberOfYears",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
]
(ref BigQuery autodetect doesn't work with inconsistent json?, showing that I can't use the BigQuery autodetect because the items aren't the same. I'm fairly confident I can merge schemas together manually to create a superset)
Here's a simple recursive function that may help if you decide to roll your own:
def schema:
def isdate($v): $v | test("[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]");
def array($k;$v): {"name":$k,"type":"RECORD",mode:"REPEATED","fields":($v[0] | schema)};
def date($k): {"name":$k,"type":"DATE", mode:"NULLABLE"};
def string($k): {"name":$k,"type":"STRING",mode:"NULLABLE"};
def item($k;$v):
$v | if type == "array" then array($k;$v)
elif type == "string" and isdate($v) then date($k)
elif type == "string" then string($k)
else empty end;
[ to_entries[] | item(.key;.value) ]
;
schema
Try it online!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With