I am attempting to import a large amount of .csv files into Google BigQuery.
One of my String columns sometimes contains an array which contains double quotes that are escaped with \
's. Example:
...,"{\"ex1\":\"somestuffhere\"\,\"ex2\":\"somestuffhere\"\,\"ex3\":\"somestuffhere\"}",
...
I believe the issue is caused by BigQuery by default expecting double quotes to be escaped with another double quote - ""
. I have experimented sending different parameters to the --quote
argument but it does not function as I was hoping.
The current command line load command I am attempting:
bq load --noreplace --skip_leading_rows=1 --allow_jagged_rows='true' --ignore_unknown_values='true' --quote='"' --source_format=CSV TestDataset.cmdLineTest gs://s3_data_transfers/Events_10422/LIVE/* myschema.json
Is there anyway I can get the platform to accept these rows? Or am I forced to investigate preprocessing steps? This table will be updated every day based on new csv's being sent, so ideally trying to keep preprocessing steps to a minimum.
Thanks!
I think this issue is due to --quote
argument that you have already pointed out. Since "
is the default value to quote CSV data sections, it is incoherent if your data contains "
. I have been testing with some similar data (with \"
) and BigQuery returns the same error:
Error: Data between close double quote (") and field separator.
The solution for this is to specify the --quote
flag to another character (in this way, the incoherence will be solved). In my case, it worked with '
. The following command worked for me:
bq --location=US load --autodetect --source_format=CSV --quote "'" MY_DATASET.MY_TABLE MY_CSV_FILE
Make sure that your data does not contain '
, Otherwise you should find another character for that or set it to empty string as documentation states. Refer to this document for details about this flag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With