I have CSV files, tab-separated, fields not wrapped in quotes, where field data can contain characters like single quotes, double quotes, pipes and backslashes.
Sample data can look like this:
1 2 "ba$aR\eR\ 18
I want to import this data into Postgres using the COPY statement.
When I try to import this using
COPY <tablename> FROM <filename> NULL AS '';
I get an error psql:-:1: ERROR: missing data for column
because Postgres is treating the backslash + tab as an "escaped tab" instead of a backslash followed by the field separator.
So I switched to using the "CSV format" of the COPY operator, like so:
COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' NULL AS '';
Now there's a new error psql:-:1: ERROR: value too long for type character varying(254)
Apparently because it's interpreting the double-quote at the start of field 3 as the field wrapping character.
How can I specify that my data is NOT quoted at all?
By default, the escape character is a " (double quote) for CSV-formatted files. If you want to use a different escape character, use the ESCAPE clause of COPY , CREATE EXTERNAL TABLE or gpload to declare a different escape character.
You can put quotes, dashes and spaces in the CSV file. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it.
Normally single and double quotes are commonly used with any text data in PostgreSQL. To ignore or escape the single quote is a common requirement of all database developers. By using double quotes and backslash we can avoid the complexity of single quotes as well as it is easy to read and maintain.
Quotation marks appear in CSV files as text qualifiers. This means, they function to wrap together text that should be kept as one value, versus what are distinct values that should be separated out.
Workaround (thanks to this comment!)
COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' QUOTE E'\b' NULL AS '';
So basically specifying a quote character that should never be in the text, but that's pretty ugly.
I'd much prefer it if there was in fact a way to turn off quote processing altogether.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With