Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delimiting binary sequences

I need to be able to delimit a stream of binary data. I was thinking of using something like the ASCII EOT (End of Transmission) character to do this.

However I'm a bit concerned -- how can I know for sure that the particular binary sequence used for this (0b00000100) won't appear in my own binary sequences, thus giving a false positive on delimitation?

In other words, how is binary delimiting best handled?

EDIT: ...Without using a length header. Sorry guys, should have mentioned this before.

like image 358
Engineer Avatar asked Dec 17 '11 00:12

Engineer


People also ask

Is there Delimeter for binary files?

The binary delimiter to be inserted after the data from each message must be expressed as a comma-separated list of hexadecimal bytes, for example: x34,xE7,xAE .

Does a binary file have a yes or no delimiter?

Don't do it! the binary data could contain \n , and it would be mixed up with the delimiters: import os, random with open('test', 'wb') as f: for i in range(100): # create 100 binary sequences of random length = random. randint(2, 100) # length (between 2 and 100) f.

What are examples of delimiters?

A delimiter is one or more characters that separate text strings. Common delimiters are commas (,), semicolon (;), quotes ( ", ' ), braces ({}), pipes (|), or slashes ( / \ ). When a program stores sequential or tabular data, it delimits each item of data with a predefined character.

What is the use of delimiting character?

In computer programming, a delimiter is a character that identifies the beginning or the end of a character string (a contiguous sequence of characters). The delimiting character is not part of the character string.


1 Answers

You've got five options:

  • Use a delimiter character that is unlikely to occur. This runs the risk of you guessing incorrectly. I don't recommend this approach.
  • Use a delimiter character and an escape sequence to include the delimiter. You may need to double the escape character, depending upon what makes for easier parsing. (Think of the C \0 to include an ASCII NUL in some content.)
  • Use a delimiter phrase that you can determine does not occur. (Think of the mime message boundaries.)
  • Prepend a length field of some sort, so you know to read the following N bytes as data. This has the downside of requiring you to know this length before writing the data, which is sometimes difficult or impossible.
  • Use something far more complicated, like ASN.1, to completely describe all your content for you. (I don't know if I'd actually recommend this unless you can make good use of it -- ASN.1 is awkward to use in the best of circumstances, but it does allow completely unambiguous binary data interpretation.)
like image 171
sarnold Avatar answered Oct 13 '22 22:10

sarnold