Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Redshift - COPY from CSV - single Double Quote in row - Invalid quote formatting for CSV Error

I'm loading a CSV file from S3 into Redshift. This CSV file is analytics data which contains the PageUrl (which may contain user search info inside a query string for example).

It chokes on rows where there is a single, double-quote character, for example if there is a page for a 14" toy then the PageUrl would contain:

http://www.mywebsite.com/a-14"-toy/1234.html

Redshift understandably can't handle this as it is expecting a closing double quote character.

The way I see it my options are:

  1. Pre-process the input and remove these characters
  2. Configure the COPY command in Redshift to ignore these characters but still load the row
  3. Set MAXERRORS to a high value and sweep up the errors using a separate process

Option 2 would be ideal, but I can't find it!

Any other suggestions if I'm just not looking hard enough?

Thanks

Duncan

like image 400
Duncan Avatar asked Aug 06 '15 16:08

Duncan


1 Answers

It's 2017 and I run into the same problem, happy to report there is now a way to get redshift to load csv files with the odd " in the data.

The trick is to use the ESCAPE keyword, and also to NOT use the CSV keyword. I don't know why, but having the CSV and ESCAPE keywords together in a copy command resulted in failure with the error message "CSV is not compatible with ESCAPE;" However with no change to the loaded data I was able to successfully load once I removed the CSV keyword from the COPY command.

You can also refer to this documentation for help: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape

like image 174
ayeletd Avatar answered Oct 31 '22 02:10

ayeletd