Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Binary Data in C?

Tags:

Are there any libraries or guides for how to read and parse binary data in C?

I am looking at some functionality that will receive TCP packets on a network socket and then parse that binary data according to a specification, turning the information into a more useable form by the code.

Are there any libraries out there that do this, or even a primer on performing this type of thing?

like image 633
kaybenleroll Avatar asked Nov 26 '08 17:11

kaybenleroll


People also ask

What is a binary parser?

Binary Parser is a function for Soracom Air devices that parses binary data with a fixed format and converts the data into easy-to-use JSON data, which can then be passed to Soracom Beam, Funnel, Funk, and Harvest Data. Binary Parser is available for Air for Cellular, Air for Sigfox, and Air for LoRaWAN.

How do I write a BIN file?

To write to a binary fileUse the WriteAllBytes method, supplying the file path and name and the bytes to be written. This example appends the data array CustomerData to the file named CollectedData. dat .


1 Answers

I have to disagree with many of the responses here. I strongly suggest you avoid the temptation to cast a struct onto the incoming data. It seems compelling and might even work on your current target, but if the code is ever ported to another target/environment/compiler, you'll run into trouble. A few reasons:

Endianness: The architecture you're using right now might be big-endian, but your next target might be little-endian. Or vice-versa. You can overcome this with macros (ntoh and hton, for example), but it's extra work and you have make sure you call those macros every time you reference the field.

Alignment: The architecture you're using might be capable of loading a mutli-byte word at an odd-addressed offset, but many architectures cannot. If a 4-byte word straddles a 4-byte alignment boundary, the load may pull garbage. Even if the protocol itself doesn't have misaligned words, sometimes the byte stream itself is misaligned. (For example, although the IP header definition puts all 4-byte words on 4-byte boundaries, often the ethernet header pushes the IP header itself onto a 2-byte boundary.)

Padding: Your compiler might choose to pack your struct tightly with no padding, or it might insert padding to deal with the target's alignment constraints. I've seen this change between two versions of the same compiler. You could use #pragmas to force the issue, but #pragmas are, of course, compiler-specific.

Bit Ordering: The ordering of bits inside C bitfields is compiler-specific. Plus, the bits are hard to "get at" for your runtime code. Every time you reference a bitfield inside a struct, the compiler has to use a set of mask/shift operations. Of course, you're going to have to do that masking/shifting at some point, but best not to do it at every reference if speed is a concern. (If space is the overriding concern, then use bitfields, but tread carefully.)

All this is not to say "don't use structs." My favorite approach is to declare a friendly native-endian struct of all the relevant protocol data without any bitfields and without concern for the issues, then write a set of symmetric pack/parse routines that use the struct as a go-between.

typedef struct _MyProtocolData
{
    Bool myBitA;  // Using a "Bool" type wastes a lot of space, but it's fast.
    Bool myBitB;
    Word32 myWord;  // You have a list of base types like Word32, right?
} MyProtocolData;

Void myProtocolParse(const Byte *pProtocol, MyProtocolData *pData)
{
    // Somewhere, your code has to pick out the bits.  Best to just do it one place.
    pData->myBitA = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_A_MASK >> MY_BIT_A_SHIFT;
    pData->myBitB = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_B_MASK >> MY_BIT_B_SHIFT;

    // Endianness and Alignment issues go away when you fetch byte-at-a-time.
    // Here, I'm assuming the protocol is big-endian.
    // You could also write a library of "word fetchers" for different sizes and endiannesses.
    pData->myWord  = *(pProtocol + MY_WORD_OFFSET + 0) << 24;
    pData->myWord += *(pProtocol + MY_WORD_OFFSET + 1) << 16;
    pData->myWord += *(pProtocol + MY_WORD_OFFSET + 2) << 8;
    pData->myWord += *(pProtocol + MY_WORD_OFFSET + 3);

    // You could return something useful, like the end of the protocol or an error code.
}

Void myProtocolPack(const MyProtocolData *pData, Byte *pProtocol)
{
    // Exercise for the reader!  :)
}

Now, the rest of your code just manipulates data inside the friendly, fast struct objects and only calls the pack/parse when you have to interface with a byte stream. There's no need for ntoh or hton, and no bitfields to slow down your code.

like image 184
Casey Barker Avatar answered Oct 02 '22 23:10

Casey Barker