Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ASCII delimiters (29-31) in modern programming

Tags:

ascii

I'm currently building a hash key string (collapsed from a map) where the values that are delimited by the special ASCII unit delimiter 31 (1F).

This nicely solves the problem of trying to guess what ASCII characters won't be used in the string values and I don't need to worry about escaping or quoting values etc.

However reading about the history of this is it appears to be a relic from the 1960s and I haven't seen many examples where strings are built and tokenised using this special character so it all seems too easy.

Are there any issues to using this delimiter in a modern application?

I'm currently doing this in a non-Unicode C++ application, however I'm interested to know how this applies generally in other languages such as Java, C# and with Unicode.

like image 211
TownCube Avatar asked Dec 30 '12 18:12

TownCube


People also ask

What are ASCII delimiters?

Delimited ASCII is a data format in which fields and records are separated by selected characters called delimiters. Field and record delimiters are distinct. You can set the delimiters in the source or target properties for this connector. This connector sets field width in bytes.

Is ASCII code still used?

ASCII originally contained only 128 English-language letters and symbols but was later expanded to include additional characters, including those used in other languages. ASCII continues to exist but has been largely replaced by Unicode, which can be used to encode any language.


2 Answers

The lower 128 char map of ASCII is fully set in stone into the Unicode standard, this including characters 0->31. The only reason you don't see special ASCII chars in use in strings very often is simply because of human interfacing limitations: they do not visualize well (if at all) when displayed to screen or written to file, and you can't easily type them in from a keyboard either. They're also not allowed in un-escaped form within various popular 'human readable' file formats, such as XML.

For logical processing tasks within a program that do not need end-user interaction, however, they are perfectly suitable for whatever use you can find for them. Your particular use sounds novel and efficient and I think you should definitely run with it.

like image 198
jstine Avatar answered Oct 01 '22 01:10

jstine


Your application is free to accept whatever binary format it pleases. However, if you need to embed arbitrary binary data in your input, you need to escape whatever delimiters or other special codes your format uses. This is true regardless of which ones you choose.

I'd also not ignore Unicode. It's 2012, by now it's rather silly to work with an outdated model for dealing with text. If your input data is textual, handle it as such.

The one issue that comes to mind is why invent another format instead of using XML or JSON; or if you need a compact encoding, a "binary" variant of those two (Fast Infoset, msgpack, who knows what else), or ASN.1? There's probably a whole bunch of other issues that you'll encounter when rolling your own that the design and tooling for those formats already solved.

like image 40
millimoose Avatar answered Oct 01 '22 00:10

millimoose