Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error Loading Large CSV into Google BigQuery

Getting an Error on loading a large CSV into bigquery. Everywhere I read online I see that there is a 5gb size limit on zipped files but no limits on CSV.

BigQuery error in load operation: Error processing job 'bqjob_r3016bbfad3037f_0000015cea1a1eff_1': Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 24686861596. Max allowed size is: 4294967296.

like image 365
user3078500 Avatar asked Jun 27 '17 15:06

user3078500


2 Answers

Try this:

  • Turn off quoting
  • Set separating character to a non occurring character.

bq help load:

--quote: Quote character to use to enclose records. Default is ". To indicate no quote character at all, use an empty string.
-F,--field_delimiter: The character that indicates the boundary between columns in the input file. "\t" and "tab" are accepted names for tab.

This will import each CSV line to a one column table. Split afterwards within BigQuery (with REGEXP_EXTRACT(), SPLIT(), or JavaScript UDF).

like image 24
Felipe Hoffa Avatar answered Sep 21 '22 15:09

Felipe Hoffa


BigQuery documentation lists various limits for import jobs here: https://cloud.google.com/bigquery/quota-policy#load_jobs In particular it notes, that the limit of compressed CSV file is 4 GBs.

The error message about "not splittable" CSV file can come in two cases:

  1. CSV file was compressed
  2. There is a quoting character mismatch in one of the fields, which makes it look like very long string in that field, also making file not splittable (this is what likely happened in your case).
like image 164
Mosha Pasumansky Avatar answered Sep 19 '22 15:09

Mosha Pasumansky