Need help creating schema for loading CSV into BigQuery

Question

I am trying to load some CSV files into BigQuery from Google Cloud Storage and wrestling with schema generation. There is an auto-generate option but it is poorly documented. The problem is that if I choose to let BigQuery generate the schema, it does a decent job of guessing data types, but only sometimes does it recognizes the first row of the data as a header row, and sometimes it does not (treats the 1st row as data and generates column names like string_field_N). The first rows of my data are always header rows. Some of the tables have many columns (over 30), and I do not want to mess around with schema syntax because BigQuery always bombs with an uninformative error message when something (I have no idea what) is wrong with the schema.

So: How can I force it to recognize the first row as a header row? If that isn't possible, how do I get it to spit out the schema it generated in the proper syntax so that I can edit it (for appropriate column names) and use that as the schema on import?

Raunak Jhawar · Accepted Answer

I would recommend doing 2 things here:

Preprocess your file and store the final layout of the file sans the first row i.e. the header row
BQ load accepts an additional parameter in form of a JSON schema file, use this to explicitly define the table schema and pass this file as a parameter. This allows you the flexibility to alter schema at any point in time, if required

Allowing BQ to autodetect schema is not advised.

Need help creating schema for loading CSV into BigQuery

Tags:

csv

loading

google-bigquery

Bill Rosenblatt

1 Answers

Raunak Jhawar

Recent Activity

Donate For Us

Need help creating schema for loading CSV into BigQuery

Tags:

csv

loading

google-bigquery

Bill Rosenblatt

1 Answers

Raunak Jhawar

Related questions

Recent Activity

Donate For Us