Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best separator/delimiter character(s) for a plaintext db file? [closed]

People also ask

What is a good delimiter to use?

Nearly any delimiter is better than a comma. The reason is that, when comma-delimited files are being read in to some data parsing tools, commas can be confused with punctuation, disrupting the "layout" of the fields or columns.

What is the delimiter for text?

A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.


Well, there are few separator characters in US-ASCII, hex 1c, 1d, 1e and 1f. The plain text shouldn't contain them.

1c  FS  ␜  ^\  File Separator
1d  GS  ␝  ^]  Group Separator
1e  RS  ␞  ^^  Record Separator
1f  US  ␟  ^_  Unit Separator

No matter which character you choose as your separator, you'll want to escape any instance of that character in your data.

Perhaps tilde(~), or go to a high-ASCII character.

Either way, if there's any chance that it could sneak into your data, you'd want to escape it before writing to your plaintext file.


I think the best way to join string with a three cherries '@@@'.


For a particular data warehousing situation where we had control over the source file, but escaping and qualifying were onerous, we were able to make the business decision that one extended ASCII character would be stripped from the data (if it ever occurs, which it hasn't).

On creation of the delimited source file, we stripped out any instances of █ (alt+219) in the data and use that character for the delimiter. Bonus, that character is really easy to spot.


Personally I like using « as a delimiter character to split data in CSV files, I don't think I've ever found a naturally occurring instance of « and » personally, so here are my two cents about it.