Background
I want to insert/append newline formatted JSON into bigquery
table through python client API.
Eg:
{"name":"xyz",mobile:xxx,location:"abc"}
{"name":"xyz",mobile:xxx,age:22}
Issue is, all fields in a row are optional and there is no fixed defined schema for the data.
Query
I have read that we can use Federated tables, which supports autoschema detection.
However, I am looking for a feature, that would automatically detect schema from data,create tables accordingly and even adjust the table schema if any extra columns/keys appear in data instead of creating new table.
Would this be possible using python client API.
You can use autodetect with BigQuery load API, i.e. your example using bq cli tool will look like following:
~$ cat /tmp/x.json
{"name":"xyz","mobile":"xxx","location":"abc"}
{"name":"xyz","mobile":"xxx","age":"22"}
~$ bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON tmp.x /tmp/x.json
Upload complete.
~$ bq show tmp.x
Table tmp.x
Last modified Schema Total Rows Total Bytes Expiration
----------------- --------------------- ------------ ------------- ------------
16 Aug 08:23:35 |- age: integer 2 33
|- location: string
|- mobile: string
|- name: string
~$ bq query "select * from tmp.x"
+------+----------+--------+------+
| age | location | mobile | name |
+------+----------+--------+------+
| NULL | abc | xxx | xyz |
| 22 | NULL | xxx | xyz |
+------+----------+--------+------+
Update: If later you need to add additional fields, you can use schema_update_option to allow new fields. Alas it doesn't yet work with autodetect, so you need to provide new schema explicitly to the load API:
~$ cat /tmp/x1.json
{"name":"abc","mobile":"yyy","age":"25","gender":"male"}
~$ bq load --schema=name:STRING,age:INTEGER,location:STRING,mobile:STRING,gender:STRING --schema_update_option=ALLOW_FIELD_ADDITION --source_format=NEWLINE_DELIMITED_JSON tmp.x /tmp/x1.json
Upload complete.
~$ bq show tmp.x
Table tmp.x
Last modified Schema Total Rows Total Bytes Expiration
----------------- --------------------- ------------ ------------- -----------
19 Aug 10:43:09 |- name: string 3 57
|- age: integer
|- location: string
|- mobile: string
|- gender: string
~$ bq query "select * from tmp.x"
status: DONE
+------+------+----------+--------+--------+
| name | age | location | mobile | gender |
+------+------+----------+--------+--------+
| abc | 25 | NULL | yyy | male |
| xyz | NULL | abc | xxx | NULL |
| xyz | 22 | NULL | xxx | NULL |
+------+------+----------+--------+--------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With