Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'Missing close double quote (") character' is complained when there're line feeds in csv file when loading data to BigQuery

The culprit line is as follows. It should be composed of 14 columns, with one of the column, starting with 'Hi I'm Niger...', covering multiple line with line feeds.

17935,9a7105ee-30c8-4a6d-9374-10875b7d6288.jpg,"""top""=>""0"", ""left""=>""0"", ""width""=>""180"", ""height""=>""180""",,"",2015-07-26 19:33:57.292058,2015-07-26 20:25:30.068887,fe43876f-1b2c-464a-aa20-bf335ed3ff62,c68c8c70-bc2b-11e4-90a1-22000b21105f,{},2e790350-15fb-0133-2cb8-22000ba51078,"Hi I'm Nigerian so wish to study in sweden.
so I'm Undergraduate student I want study Engineering. 
Thanks.","",{}

When loading this csv data into BigQuery via command bq load --replace --source_format=CSV -F"," ..., Error complains. Could anyone give me an solution to this BigQuery Load Data command?

- File: 0 / Line:17192 / Field:12: Missing close double quote (")
character: field starts with: <Hi I'm N>
- File: 0 / Line:17193: Too few columns: expected 14 column(s) but
got 1 column(s). For additional help: http://goo.gl/RWuPQ
- File: 0 / Line:17194: Too few columns: expected 14 column(s) but
got 3 column(s). For additional help: http://goo.gl/RWuPQ
like image 909
Judking Avatar asked Nov 13 '15 14:11

Judking


Video Answer


2 Answers

If you are loading CSV with embedded newlines, you need to specify allowQuotedNewlines.

https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.allowQuotedNewlines

The BigQuery default is to assume that CSV data does not contain newlines. This allows for a much higher parsing throughput when dealing with large data files since the input files can be split at arbitrary newlines. If your data contains newlines within strings, each file needs to be parsed linearly by a single machine.

like image 97
Michael Sheldon Avatar answered Sep 18 '22 14:09

Michael Sheldon


Make sure you include this line before loading data to BigQuery: 'job_config.allow_quoted_newlines = True'

job_config = bigquery.LoadJobConfig()
job_config.allow_quoted_newlines = True
like image 33
Ceren Erdogan Avatar answered Sep 18 '22 14:09

Ceren Erdogan