Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting BigQuery to auto detect schema

I'm trying to load a csv file and have BigQuery automatically create the schema.

bq load --source_format CSV -F '\t'  --skip_leading_rows=1 voterdb.voters gs://[[ redacted bucket ]]/extract.csv
BigQuery error in load operation: Error processing job 'tokyo-unity-87516:bqjob_r3682474e46ce720f_0000015421aee065_1': No schema specified on job or table.

According to https://cloud.google.com/bigquery/federated-data-sources automatically creating/inferring the schema is supported.

like image 884
Tim Clemans Avatar asked Apr 17 '16 01:04

Tim Clemans


People also ask

Can you automate BigQuery?

BigQuery automation can extend the capabilities of cohort analysis using Google Data Studio. BQ jobs can be automatically executed to process the GA data, store them in staging tables, and finally populate the cohort analysis report each month with new and updated data.

Does BigQuery support Altertable?

Data definition language (DDL) statements let you create and modify BigQuery resources using Google Standard SQL query syntax. You can use DDL commands to create, alter, and delete resources, such as tables, table clones, table snapshots, views, user-defined functions (UDFs), and row-level access policies.


2 Answers

The --autodetect flag is probably what you want. It works for CSV and (newline delimited) JSON input files.

For example...

bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect yourdataset.yourtable inputfile.json

See documentation here: https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile

Note that this has nothing to do with federated data sources.

like image 85
Chris Sears Avatar answered Sep 22 '22 13:09

Chris Sears


As stated in a previous answer, schema auto-detection is a part of federated data source querying. It is not explicitly stated under the CSV and JSON schema auto-detection heading that this does not apply to bq load. If you feel this is unclear in the documentation, I would strongly suggest clicking Send feedback at the top righthand corner of that documentation page and describing this ambiguity in detail.

As for the bq load command, according to the bq load documentation, the table schema is a required parameter. Omitting it does result in the error message you are facing.

EDIT Thanks to polleyg for the update. In this blog post, it was announced that the schema should also be detected at load. As mentioned by Chris Sears, the --autodetect flag should meet your needs.

like image 45
Nicholas Avatar answered Sep 24 '22 13:09

Nicholas