Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret binary data in C++?

Tags:

c++

embedded

byte

I am sending and receiving binary data to/from a device in packets (64 byte). The data has a specific format, parts of which vary with different request / response.

Now I am designing an interpreter for the received data. Simply reading the data by positions is OK, but doesn't look that cool when I have a dozen different response formats. I am currently thinking about creating a few structs for that purpose, but I don't know how will it go with padding.

Maybe there's a better way?


Related:

  • Safe, efficient way to access unaligned data in a network packet from C
like image 503
Saulius Žemaitaitis Avatar asked May 12 '09 11:05

Saulius Žemaitaitis


3 Answers

You need to use structs and or unions. You'll need to make sure your data is properly packed on both sides of the connection and you may want to translate to and from network byte order on each end if there is any chance that either side of the connection could be running with a different endianess.

As an example:

#pragma pack(push)  /* push current alignment to stack */
#pragma pack(1)     /* set alignment to 1 byte boundary */
typedef struct {
    unsigned int    packetID;  // identifies packet in one direction
    unsigned int    data_length;
    char            receipt_flag;  // indicates to ack packet or keep sending packet till acked
    char            data[]; // this is typically ascii string data w/ \n terminated fields but could also be binary
} tPacketBuffer ;
#pragma pack(pop)   /* restore original alignment from stack */

and then when assigning:

packetBuffer.packetID = htonl(123456);

and then when receiving:

packetBuffer.packetID = ntohl(packetBuffer.packetID);

Here are some discussions of Endianness and Alignment and Structure Packing

If you don't pack the structure it'll end up aligned to word boundaries and the internal layout of the structure and it's size will be incorrect.

like image 72
Robert S. Barnes Avatar answered Oct 15 '22 11:10

Robert S. Barnes


I've done this innumerable times before: it's a very common scenario. There's a number of things which I virtually always do.

Don't worry too much about making it the most efficient thing available.

If we do wind up spending a lot of time packing and unpacking packets, then we can always change it to be more efficient. Whilst I've not encountered a case where I've had to as yet, I've not been implementing network routers!

Whilst using structs/unions is the most efficient approach in term of runtime, it comes with a number of complications: convincing your compiler to pack the structs/unions to match the octet structure of the packets you need, work to avoid alignment and endianness issues, and a lack of safety since there is no or little opportunity to do sanity checks on debug builds.

I often wind up with an architecture including the following kinds of things:

  • A packet base class. Any common data fields are accessible (but not modifiable). If the data isn't stored in a packed format, then there's a virtual function which will produce a packed packet.
  • A number of presentation classes for specific packet types, derived from common packet type. If we're using a packing function, then each presentation class must implement it.
  • Anything which can be inferred from the specific type of the presentation class (i.e. a packet type id from a common data field), is dealt with as part of initialisation and is otherwise unmodifiable.
  • Each presentation class can be constructed from an unpacked packet, or will gracefully fail if the packet data is invalid for the that type. This can then be wrapped up in a factory for convenience.
  • If we don't have RTTI available, we can get "poor-man's RTTI" using the packet id to determine which specific presentation class an object really is.

In all of this, it's possible (even if just for debug builds) to verify that each field which is modifiable is being set to a sane value. Whilst it might seem like a lot of work, it makes it very difficult to have an invalidly formatted packet, a pre-packed packets contents can be easilly checked by eye using a debugger (since it's all in normal platform-native format variables).

If we do have to implement a more efficient storage scheme, that too can be wrapped in this abstraction with little additional performance cost.

like image 29
Wuggy Avatar answered Oct 15 '22 09:10

Wuggy


It's hard to say what the best solution is without knowing the exact format(s) of the data. Have you considered using unions?

like image 27
Rune Aamodt Avatar answered Oct 15 '22 09:10

Rune Aamodt