Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to include commas in CSV columns without breaking the formatting?

Tags:

csv

I've got a two column CSV with a name and a number. Some people's name use commas, for example Joe Blow, CFA. This comma breaks the CSV format, since it's interpreted as a new column.

I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other).

I'd really like to keep the comma separator (I know excel supports other delimiters but other interpreters may not). I'd also like to keep the comma in the name, as Joe Blow| CFA looks pretty silly.

Is there a way to include commas in CSV columns without breaking the formatting, for example by escaping them?

like image 768
buley Avatar asked Jan 06 '11 17:01

buley


People also ask

How do CSV deal with commas?

Since CSV files use the comma character "," to separate columns, values that contain commas must be handled as a special case. These fields are wrapped within double quotation marks. The first double quote signifies the beginning of the column data, and the last double quote marks the end.

Why do commas mess with CSV files?

This comma breaks the CSV format, since it's interpreted as a new column. I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other ).


2 Answers

Enclose the field in quotes, e.g.

field1_value,field2_value,"field 3,value",field4, etc... 

See wikipedia.

Updated:

To encode a quote, use ", one double quote symbol in a field will be encoded as "", and the whole field will become """". So if you see the following in e.g. Excel:

--------------------------------------- | regular_value |,,,"|  ,"", |"""   |"| --------------------------------------- 

the CSV file will contain:

regular_value,",,,""",","""",","""""""","""" 

A comma is simply encapsulated using quotes, so , becomes ",".

A comma and quote needs to be encapsulated and quoted, so "," becomes """,""".

like image 163
Ryan Avatar answered Sep 19 '22 23:09

Ryan


The problem with the CSV format, is there's not one spec, there are several accepted methods, with no way of distinguishing which should be used (for generate/interpret). I discussed all the methods to escape characters (newlines in that case, but same basic premise) in another post. Basically it comes down to using a CSV generation/escaping process for the intended users, and hoping the rest don't mind.

Reference spec document.

like image 41
Rudu Avatar answered Sep 19 '22 23:09

Rudu