Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse/encode binary message formats?

I need to parse and encode to a legacy binary message format in Java. I began by using DataOutputStream to read/write primitive types but the problem I'm having is that the message format doesn't align nicely to byte offsets and includes bit flags.

For example I have to deal with messages like this:

+----------+---+---+----------+---------+--------------+
+uint32    +b   +b + uint32   +4bit enum+32 byte string+
+----------+---+---+----------+---------+--------------+

Where (b) is a one bit flag. The problem being that java primitive types don't align to byte boundaries so I wouldn't be able to use DataOutputStream to encode this since the lowest level type I can write is a byte.

Are there any libraries, standard or 3rd party, for dealing with arbitrary bit level message formats?

Edit: Thanks to @Software Monkey for forcing me to look at my spec more closely. The spec I am using does actually align on byte boundaries so DataOutputStream is appropriate. Given my original question though I would have gone with the solution proposed by @emboss.

Edit: Although the message format for this question was discovered to be on byte boundaries I've come across another message format that is applicable to the original question. This format defines a 6 bit character mapping where each character really only takes up 6 bits, not the full byte, so character strings do not align on byte boundaries. I have discovered several binary output streams that tackle this problem. Like this one: http://introcs.cs.princeton.edu/java/stdlib/BinaryOut.java.html

like image 398
kenen Avatar asked Jul 28 '11 17:07

kenen


2 Answers

There is a builtin byte type in Java, and you can read into byte[] buffers just fine using InputStream#read(byte[]) and write to an OutputStream using OutputStream#write(byte[], int, int), so there's no problem in that.

Regarding your messages - as you noted correctly, the tiniest bit of information you get at a time is a byte, so you will have to decompose your message format into 8 bit chunks first:

Let's suppose your message is in a byte[] named data. I also assume little-endianness.

A uint32 is 32 bits long -> that's four bytes. (Be careful when parsing this in Java, Java integers and longs are signed, you need to handle that. An easy way to avoid trouble would be taking longs for that. data[0] fills bits 31 - 24, data[1] 23 - 16, data[2] bits 15 - 8 and data[3] bits 7 to 0. So you need to shift them appropriately to the left and glue them together with logical OR:

long uint32 = ((data[0]&0xFF) << 24) | 
              ((data[1]&0xFF) << 16) | 
              ((data[2]&0xFF) << 8)  | 
               (data[3]&0xFF);

Next, there are two single bits. I suppose you have to check whether they are "on" (1) or "off" (0). To do this, you use bit masks and compare your byte with logical AND.

First bit: ( binary mask | 1 0 0 0 0 0 0 0 | = 128 = 0x80 )

if ( (data[4] & 0x80 ) == 0x80 ) // on

Second bit: ( binary mask | 0 1 0 0 0 0 0 0 | = 64 = 0x40 )

if ( (data[4] & 0x40 ) == 0x40 ) // on

To compose the next uint32, you will have to compose bytes over byte boundaries of the underlying data. E.g. for the first byte take the remaining 6 bits of data[4], shift them two to the left (they will be bit 8 to 2 of the uint32) and "add" the first (highest) two of data[5] by shifting them 6 bits to the right (they will take the remaining 1 and 0 slot of the uint32). "Adding" means logically OR'ing:

byte uint32Byte1 = (byte)( (data[4]&0xFF) << 2 | (data[5]&&0xFF) >> 6);

Building your uint32 is then the same procedure as in the first example. And so on and so forth.

like image 52
emboss Avatar answered Nov 04 '22 20:11

emboss


with Java Binary Block Parser the script to parse the message will be

  class Parsed {
    @Bin int field1;
    @Bin (type = BinType.BIT) boolean field2;
    @Bin(type = BinType.BIT) boolean field3;
    @Bin int field4;
    @Bin(type = BinType.BIT) int enums;
    @Bin(type = BinType.UBYTE_ARRAY) String str;
  }

  Parsed parsed = JBBPParser.prepare("int field1; bit field2; bit field3; int field4; bit:4 enums; ubyte [32] str;").parse(STREAM).mapTo(Parsed.class);
like image 5
Igor Maznitsa Avatar answered Nov 04 '22 20:11

Igor Maznitsa