I'm generating a CSV file (delimited by commas rather than tabs). My users will most likely open the CSV file in Excel by double clicking it. My data may contain commas and speech marks, so I'm escaping those as follows.
Reference, Title, Description 1, "My little title", "My description, which may contain ""speech marks"" and commas." 2, "My other little title", "My other description, which may also contain ""speech marks"" and commas."
As far as I know that's always been the way to do it. Here's my boggle: when I open this file in Excel 2010 my escaping is not respected. Speech marks appear on the sheet, and the comma causes new columns.
Since CSV files use the comma character "," to separate columns, values that contain commas must be handled as a special case. These fields are wrapped within double quotation marks. The first double quote signifies the beginning of the column data, and the last double quote marks the end.
By default, the escape character is a " (double quote) for CSV-formatted files. If you want to use a different escape character, use the ESCAPE clause of COPY , CREATE EXTERNAL TABLE or gpload to declare a different escape character.
This comma breaks the CSV format, since it's interpreted as a new column. I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other ).
We eventually found the answer to this.
Excel will only respect the escaping of commas and speech marks if the column value is NOT preceded by a space. So generating the file without spaces like this...
Reference,Title,Description 1,"My little title","My description, which may contain ""speech marks"" and commas." 2,"My other little title","My other description, which may also contain ""speech marks"" and commas."
... fixed the problem. Hope this helps someone!
Below are the rules if you believe it's random. A utility function can be created on the basis of these rules.
If the value contains a comma, newline or double quote, then the String value should be returned enclosed in double quotes.
Any double quote characters in the value should be escaped with another double quote.
If the value does not contain a comma, newline or double quote, then the String value should be returned unchanged.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With