Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing of binary data with scala

Tags:

scala

I need to parse some simple binary Files. (The files contains n entries which consists of several signed/unsigned Integers of different sizes etc.)

In the moment i do the parsing "by hand". Does somebody know a library which helps to do this type of parsing?

Edit: "By hand" means that i get the Data Byte by Byte sort it in to the correct Order and convert it to an Int/Byte etc. Also some of the Data is unsigned.

like image 349
nuriaion Avatar asked Apr 19 '10 13:04

nuriaion


5 Answers

I've used the sbinary library before and it's very nice. The documentation is a little sparse but I would suggest first looking at the old wiki page as that gives you a starting point. Then check out the test specifications, as that gives you some very nice examples.

The primary benefit of sbinary is that it gives you a way to describe the wire format of each object as a Format object. You can then encapsulate those formatted types in a higher level Format object and Scala does all the heavy lifting of looking up that type as long as you've included it in the current scope as an implicit object.

As I say below, I'd now recommend people use scodec instead of sbinary. As an example of how to use scodec, I'll implement how to read a binary representation in memory of the following C struct:

struct ST
{
   long long ll; // @ 0
   int i;        // @ 8
   short s;      // @ 12
   char ch1;     // @ 14
   char ch2;     // @ 15
} ST;

A matching Scala case class would be:

case class ST(ll: Long, i: Int, s: Short, ch1: String, ch2: String)

I'm making things a bit easier for myself by just saying we're storing Strings instead of Chars and I'll say that they are UTF-8 characters in the struct. I'm also not dealing with endian details or the actual size of the long and int types on this architecture and just assuming that they are 64 and 32 respectively.

Scodec parsers generally use combinators to build higher level parsers from lower level ones. So for below, we'll define a parser which combines a 8 byte value, a 4 byte value, a 2 byte value, a 1 byte value and one more 1 byte value. The return of this combination is a Tuple codec:

val myCodec: Codec[Long ~ Int ~ Short ~ String ~ String] = 
  int64 ~ int32 ~ short16 ~ fixedSizeBits(8L, utf8) ~ fixedSizeBits(8L, utf8)

We can then transform this into the ST case class by calling the xmap function on it which takes two functions, one to turn the Tuple codec into the destination type and another function to take the destination type and turn it into the Tuple form:

val stCodec: Codec[ST] = myCodec.xmap[ST]({case ll ~ i ~ s ~ ch1 ~ ch2 => ST(ll, i, s, ch1, ch2)}, st => st.ll ~ st.i ~ st.s ~ st.ch1 ~ st.ch2)

Now, you can use the codec like so:

stCodec.encode(ST(1L, 2, 3.shortValue, "H", "I"))
res0: scodec.Attempt[scodec.bits.BitVector] = Successful(BitVector(128 bits, 0x00000000000000010000000200034849))

res0.flatMap(stCodec.decode)
=> res1: scodec.Attempt[scodec.DecodeResult[ST]] = Successful(DecodeResult(ST(1,2,3,H,I),BitVector(empty)))

I'd encourage you to look at the Scaladocs and not at the Guide as there's much more detail in the Scaladocs. The guide is a good start at the very basics but it doesn't get into the composition part much but the Scaladocs cover that pretty well.

like image 177
Aaron Avatar answered Nov 09 '22 21:11

Aaron


Scala itself doesn't have a binary data input library, but the java.nio package does a decent job. It doesn't explicitly handle unsigned data--neither does Java, so you need to figure out how you want to manage it--but it does have convenience "get" methods that take byte order into account.

like image 39
Rex Kerr Avatar answered Nov 09 '22 23:11

Rex Kerr


I don't know what you mean with "by hand" but using a simple DataInputStream (apidoc here) is quite concise and clear:

val dis = new DataInputStream(yourSource)

dis.readFloat()
dis.readDouble()
dis.readInt()
// and so on

Taken from another SO question: http://preon.sourceforge.net/, it should be a framework to do binary encoding/decoding.. see if it has the capabilities you need

like image 26
Jack Avatar answered Nov 09 '22 21:11

Jack


If you are looking for a Java based solution, then I will shamelessly plug Preon. You just annotate the in memory Java data structure, and ask Preon for a Codec, and you're done.

like image 34
Wilfred Springer Avatar answered Nov 09 '22 22:11

Wilfred Springer


Byteme is a parser combinators library for doing binary. You can try to use it for your tasks.

like image 1
om-nom-nom Avatar answered Nov 09 '22 22:11

om-nom-nom