Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast Java binary format parsing [closed]

I regularly deal with different binary file formats in our Java EE application suite, i.e. reading into some class structure and writing them back. I need to have operations like:

  • read single byte / short / int / long, sometimes different endianness (little/big)
  • read single bits in flags, i.e.

    | uint16_t | 4 bits | 4 bits | uint16_t |
    

    should become something like

    a = stream.readUint16();
    byte tmp = stream.readUint8();
    b = (tmp & 0xf0) >> 4;
    c = (tmp & 0xf)
    d = stream.readUint16();
    
  • read strings in different encodings, sometimes dynamic length strings with something like a \0 ending

  • seeking in a file (to find data dictated by some offsets read), knowing current position, knowing how much have I left to parse in current data block
  • last, but not least, it should be fast; at least not an order of magnitude slower than declaring a typedef struct in C, reading it as a block and typecasting it in memory

So far I've analyzed my options and found that there are:

  • RandomAccessFile — best one in standard Java, has proper seeking and position methods, string read, etc, but sometimes unbearably slow due to lack of buffering in operations like ; also no bit-level access to the stream and no different endianness support
  • FileInputStream — can only read individual bytes, one has to reconstruct primitive datatypes by hand; no seeking
  • *Reader interaces — basically, can only read bytes and array of bytes, can skip, mark and reset, but they tend to leak memory if seeking is done multiple times as reset(); skip(seekAmount);
  • https://github.com/raydac/java-binary-block-parser — almost exactly what I'm looking for — i.e. a declarative specification of a format and then, voila, I've got classes, but it's essentially an interpreter, thus there are 2 major problems: (a) it's slow in high-demanding environments, (b) there are multiple type safety problems with runtime-generated reflection-like style
  • http://preon.codehaus.org/ — has lots of great reviews, but seems to be no longer developed, site is down :(

I've searched Google, I've searched StackOverflow. This question — How to parse/encode binary message formats? — addresses the same issue, but with weird non-aligned bits requirement which I don't have.

So, questions is — am I overlooking something and there are any better solutions for this problem that address all what I've mentioned?

like image 252
Yolanda V. Moore Avatar asked Mar 12 '16 11:03

Yolanda V. Moore


People also ask

Are .java files binary?

Java binary files are platform independent. They can be interpreted by any computer that supports Java. A stream is a device for transmitting or retrieving 8-bit or byte values. The emphasis is on the action of reading or writing as opposed to the data itself.

Which class would be best to use to read a binary file into a Java object?

Which class would be best to use to read a binary file into a Java object? F. Since the question asks about putting data into a structured object, the best class would be one that deserializes the data. Therefore, ObjectInputStream is the best choice.

How do you initialize a binary number in Java?

Java allows you to express integral types (byte, short, int, and long) in a binary number system. To specify a binary literal, add the prefix 0b or 0B to the integral value.


1 Answers

ByteBuffer has everything you need.
It's also probably the fastest option in pure Java (not counting JNI, sun.misc.Unsafe etc.)

  • get, getShort, getInt etc. to read all primitive types and byte arrays;
  • order to switch between BIG_ENDIAN and LITTLE_ENDIAN;
  • position for seeking;
  • CharsetEncoder, CharsetDecoder can encode/decode strings directly in ByteBuffer;
  • FileChannel.map creates a ByteBuffer mapped to a file;
  • there are two kinds of ByteBuffers for data in Java heap and off heap.
like image 194
apangin Avatar answered Sep 23 '22 12:09

apangin