Nearly any delimiter is better than a comma. The reason is that, when comma-delimited files are being read in to some data parsing tools, commas can be confused with punctuation, disrupting the "layout" of the fields or columns.
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.
Well, there are few separator characters in US-ASCII, hex 1c
, 1d
, 1e
and 1f
. The plain text shouldn't contain them.
1c FS ␜ ^\ File Separator
1d GS ␝ ^] Group Separator
1e RS ␞ ^^ Record Separator
1f US ␟ ^_ Unit Separator
No matter which character you choose as your separator, you'll want to escape any instance of that character in your data.
Perhaps tilde(~
), or go to a high-ASCII character.
Either way, if there's any chance that it could sneak into your data, you'd want to escape it before writing to your plaintext file.
I think the best way to join string with a three cherries '@@@'.
For a particular data warehousing situation where we had control over the source file, but escaping and qualifying were onerous, we were able to make the business decision that one extended ASCII character would be stripped from the data (if it ever occurs, which it hasn't).
On creation of the delimited source file, we stripped out any instances of █ (alt+219) in the data and use that character for the delimiter. Bonus, that character is really easy to spot.
Personally I like using « as a delimiter character to split data in CSV files, I don't think I've ever found a naturally occurring instance of « and » personally, so here are my two cents about it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With