Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery loading data from bq command line tool - how to skip header rows

I have a CSV data file with a header row that I am using to populate a BigQuery table:

$ cat dummy.csv
Field1,Field2,Field3,Field4
10.5,20.5,30.5,40.5
10.6,20.6,30.6,40.6
10.7,20.7,30.7,40.7

When using the Web UI, there is a text box where I am able to specify how many header rows to skip. However, if I upload the data into BigQuery using the bq command line tool, I do not have an option to do this, and always get the following error:

$ bq load my-project:my-dataset.dummydata dummy.csv Field1:float,Field2:float,Field3:float,Field4:float
Upload complete.
Waiting on bqjob_r7eccfe35f_0000015e3e8c_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'my-project:bqjob_r7eccfe35f_0000015e3e8c_1': CSV table encountered too many errors, giving up. Rows: 1;
errors: 1.
Failure details:
- file-00000000: Could not parse 'Field1' as double for field Field1
(position 0) starting at location 0

The bq command line tool quickstart documentation also does not mention any options for skipping headers.

One simple/obvious solution is to edit dummy.csv to remove the header row, but this is not an option if pointing to a CSV file on Google Cloud Storage instead of the local file dummy.csv.

This is possible to do through the web interface, and through the Python API, so it should also be possible to do with the bq tool.

like image 984
charlesreid1 Avatar asked Sep 28 '17 21:09

charlesreid1


1 Answers

Checking bq help load revealed a --skip_leading_rows option:

--skip_leading_rows : The number of rows at the beginning of the source file to skip.
    (an integer)

Also found this option in the bq command line tool documentation (which is not the same as the quickstart documentation, linked to above).

Adding a --skip_leading_rows=1 to the bq load command worked like a charm.

Here is the successful command:

$ bq load --skip_leading_rows=1 my-project:my-dataset.dummydata dummy.csv Field1:float,Field2:float,Field3:float,Field4:float
Upload complete.
Waiting on bqjob_r43eb07bad58_0000015ecea_1 ... (0s) Current status: DONE
like image 171
charlesreid1 Avatar answered Oct 03 '22 03:10

charlesreid1