I'm working on a parser to receive UDP information, parse it, and store it. To do so I'm using a BinaryReader
since it will mostly be binary information. Some of it will be strings though. MSDN says for the ReadString()
function:
Reads a string from the current stream. The string is prefixed with the length, encoded as an integer seven bits at a time.
And I completely understand it up until "seven bits at a time" which I tried to simply ignore until I started testing. I'm creating my own byte array before putting it into a MemoryStream
and attempting to read it with a BinaryReader
. Here's what I first thought would work:
byte[] data = new byte[] { 3, 0, 0, 0, (byte)'C', (byte)'a', (byte)'t', }
BinaryReader reader = new BinaryReader(new MemoryStream(data));
String str = reader.ReadString();
Knowing an int
is 4 bytes (and toying around long enough to find out that BinaryReader
is Little Endian) I pass it the length of 3 and the corresponding letters. However str
ends up holding \0\0\0
. If I remove the 3 zeros and just have
byte[] data = new byte[] { 3, (byte)'C', (byte)'a', (byte)'t', }
Then it reads and stores Cat
properly. To me this conflicts with the documentation saying that the length is supposed to be an integer. Now I'm beginning to think they simply mean a number with no decimal place and not the data type int
. Does this mean that a BinaryReader
can never read a string larger than 127 characters (since that would be 01111111 corresponding to the 7 bits part of the documentation)?
I'm writing up a protocol and need to completely understand what I'm getting into before I pass our documentation along to our clients.
I found the source code for BinaryReader
. It uses a function called Read7BitEncodedInt() and after looking up that documentation and the documentation for Write7BitEncodedInt() I found this:
The integer of the value parameter is written out seven bits at a time, starting with the seven least-significant bits. The high bit of a byte indicates whether there are more bytes to be written after this one. If value will fit in seven bits, it takes only one byte of space. If value will not fit in seven bits, the high bit is set on the first byte and written out. value is then shifted by seven bits and the next byte is written. This process is repeated until the entire integer has been written.
Also, Ralf found this link that better displays what's going on.
Unless they specifically say 'int' or 'Int32', they just mean an integer as in a whole number.
By '7 bits at time', they mean that it implements 7-bit length encoding, which seems a bit confusing at first but is actually rather straightforward. Here are some example values and how they are written out using 7-bit length encoding:
/*
decimal value binary value -> enc byte 1 enc byte 2 enc byte 3
85 00000000 00000000 01010101 -> 01010101 n/a n/a
1,365 00000000 00000101 01010101 -> 11010101 00001010 n/a
349,525 00000101 01010101 01010101 -> 11010101 10101010 00010101
*/
The table above uses big endian for no other reason than I simply had to pick one and it's what I'm most familiar with. The way 7-bit length encoding works, it is little endian by it's very nature.
Note that 85 writes out to 1 byte, 1,365 writes out to 2 bytes, and 349,525 writes out to 3 bytes.
Here's the same table using letters to show how each value's bits were used in the written output (dashes are zero-value bits, and the 0s and 1s are what's added by the encoding mechanism to indicate if a subsequent byte is to be written/read)...
/*
decimal value binary value -> enc byte 1 enc byte 2 enc byte 3
85 -------- -------- -AAAAAAA -> 0AAAAAAA n/a n/a
1,365 -------- -----BBB AAAAAAAA -> 1AAAAAAA 0---BBBA n/a
349,525 -----CCC BBBBBBBB AAAAAAAA -> 1AAAAAAA 1BBBBBBA 0--CCCBB
*/
So values in the range of 0 to 2^7-1 (127) will write out as 1 byte, values of 2^7 (128) to 2^14-1 (16,383) will use 2 bytes, 2^14 (16,384) to 2^21-1 (2,097,151) will take 3 bytes, and so on and so forth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With