Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tokenising binary data in java

Tags:

java

binary

I am writing some code to handle a stream of binary data. It is received in chunks represented by byte arrays. Combined, the byte arrays represent sequential stream of messages, each of which ends with the same constant terminator value (0xff in my case). However, the terminator value can appear at any point in any given chunk of data. A single chunk can contain part of a message, multiple messages and anything in between.

Here is a small sampling of what data handled by this might look like:

[0x00, 0x0a, 0xff, 0x01]
[0x01, 0x01]
[0xff, 0x01, 0xff]

This data should be converted into these messages:

[0x00, 0x0a, 0xff]
[0x01, 0x01, 0x01, 0xff]
[0x01, 0xff]

I have written a small class to handle this. It has a method to add some data in byte array format, which is then placed in a buffer array. When the terminator character is encountered, the byte array is cleared and the complete message is placed in a message queue, which can be accessed using hasNext() and next() methods (similar to an iterator).

This solution works fine, but as I finished it, I realized that there might already be some stable, performant and tested code in an established library that I could be using instead.

So my question is - do you know of a utility library that would have such a class, or maybe there is something in the standard Java 6 library that can do this already?

like image 713
Sevas Avatar asked Nov 12 '22 20:11

Sevas


1 Answers

I don't think you need a framework as a custom parser is simple enough.

InputStream in = new ByteArrayInputStream(new byte[]{
        0x00, 0x0a, (byte) 0xff,
        0x01, 0x01, 0x01, (byte) 0xff,
        0x01, (byte) 0xff});

ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int b; (b = in.read()) >= 0; ) {
    baos.write(b);
    if (b == 0xff) {
        byte[] bytes = baos.toByteArray();
        System.out.println(Arrays.toString(bytes));
        baos = new ByteArrayOutputStream();
    }
}

prints as (byte) 0xFF == -1

[0, 10, -1]
[1, 1, 1, -1]
[1, -1]
like image 114
Peter Lawrey Avatar answered Nov 15 '22 11:11

Peter Lawrey