Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Framework for building structured binary data parsers?

I have some experience with Pragmatic-Programmer-type code generation: specifying a data structure in a platform-neutral format and writing templates for a code generator that consume these data structure files and produce code that pulls raw bytes into language-specific data structures, does scaling on the numeric data, prints out the data, etc. The nice pragmatic(TM) ideas are that (a) I can change data structures by modifying my specification file and regenerating the source (which is DRY and all that) and (b) I can add additional functions that can be generated for all of my structures just by modifying my templates.

What I had used was a Perl script called Jeeves which worked, but it's general purpose, and any functions I wanted to write to manipulate my data I was writing from the ground up.

Are there any frameworks that are well-suited for creating parsers for structured binary data? What I've read of Antlr suggests that that's overkill. My current target langauges of interest are C#, C++, and Java, if it matters.

Thanks as always.

Edit: I'll put a bounty on this question. If there are any areas that I should be looking it (keywords to search on) or other ways of attacking this problem that you've developed yourself, I'd love to hear about them.

like image 837
John Avatar asked Mar 14 '11 23:03

John


2 Answers

Also you may look to a relatively new project Kaitai Struct, which provides a language for that purpose and also has a good IDE: Kaitai.io

like image 171
beSpark Avatar answered Sep 22 '22 05:09

beSpark


You might find ASN.1 interesting, as it provide an absract way to describe the data you might be processing. If you use ASN.1 to describe the data abstractly, you need a way to map that abstract data to concrete binary streams, for which ECN (Encoding Control Notation) is likely the right choice.

The New Jersey Machine Toolkit is actually focused on binary data streams corresponding to instruction sets, but I think that's a superset of just binary streams. It has very nice facilities for defining fields in terms of bit strings, and automatically generating accessors and generators of such. This might be particularly useful if your binary data structures contain pointers to other parts of the data stream.

like image 38
Ira Baxter Avatar answered Sep 24 '22 05:09

Ira Baxter