I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.
Some of the ideas we are looking at are: quoted Identifiers (value "," values ","etc) or using a | instead of a comma. The biggest problem is that we have to make it easy, or the customer won't do it.
You need to specify text qualifiers. Generally a double quote (") is used as text qualifiers. All the text is always put inside it and all the commas inside a text qualifier is ignored. This is a standard method for all CSV, languages and all platforms for properly handling the text.
This comma breaks the CSV format, since it's interpreted as a new column. I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other ).
There's actually a spec for CSV format, RFC 4180 and how to handle commas: Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
There's actually a spec for CSV format, RFC 4180 and how to handle commas:
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
http://tools.ietf.org/html/rfc4180
So, to have values foo
and bar,baz
, you do this:
foo,"bar,baz"
Another important requirement to consider (also from the spec):
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With