Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing binary message stream in C/C++

I'm writing a decoder for a binary protocol (Javad GRIL protocol). It consists of about a hundred messages, with data in the following format:

struct MsgData {
    uint8_t num;
    float x, y, z;
    uint8_t elevation;
    ...
};

The fields are ANSI-encoded binary numbers which follow each other with no gaps. The simplest way to parse such messages is to cast an input array of bytes to the appropriate type. The problem is that the data in stream are packed, i.e. unaligned.

On x86 this can be solved by using #pragma pack(1). However, that won't work on some other platforms or will incur performance overhead due to further work with misaligned data.

Another way is to write a specific parse function for each message type, but as I've mentioned, the protocol includes hundreds of messages.

Yet another alternative is to use something like the Perl unpack() function and store the message format somewhere. Say, we can #define MsgDataFormat "CfffC" and then call unpack(pMsgBody, MsgDataFormat). This is much shorter but still error-prone and redundant. Moreover, the format can be more complicated because messages can contain arrays, so the parser will be slow and complex.

Is there any common and effective solution? I've read this post and Googled around but didn't find a better way to do it.

Maybe C++ has a solution?

like image 450
gaga Avatar asked Jan 20 '11 15:01

gaga


4 Answers

Ok, the following compiles for me with VC10 and with GCC 4.5.1 (on ideone.com). I think all this needs of C++1x is <tuple>, which should be available (as std::tr1::tuple) in older compilers as well.

It still needs you to type some code for each member, but that is very minimal code. (See my explanation at the end.)

#include <iostream>
#include <tuple>

typedef unsigned char uint8_t;
typedef unsigned char byte_t;

struct MsgData {
    uint8_t num;
    float x;
    uint8_t elevation;

    static const std::size_t buffer_size = sizeof(uint8_t)
                                         + sizeof(float) 
                                         + sizeof(uint8_t);

    std::tuple<uint8_t&,float&,uint8_t&> get_tied_tuple()
    {return std::tie(num, x, elevation);}
    std::tuple<const uint8_t&,const float&,const uint8_t&> get_tied_tuple() const
    {return std::tie(num, x, elevation);}
};

// needed only for test output
inline std::ostream& operator<<(std::ostream& os, const MsgData& msgData)
{
    os << '[' << static_cast<int>(msgData.num) << ' ' 
       << msgData.x << ' ' << static_cast<int>(msgData.elevation) << ']';
    return os;
}

namespace detail {

    // overload the following two for types that need special treatment
    template<typename T>
    const byte_t* read_value(const byte_t* bin, T& val)
    {
        val = *reinterpret_cast<const T*>(bin);
        return bin + sizeof(T)/sizeof(byte_t);
    }
    template<typename T>
    byte_t* write_value(byte_t* bin, const T& val)
    {
        *reinterpret_cast<T*>(bin) = val;
        return bin + sizeof(T)/sizeof(byte_t);
    }

    template< typename MsgTuple, unsigned int Size = std::tuple_size<MsgTuple>::value >
    struct msg_serializer;

    template< typename MsgTuple >
    struct msg_serializer<MsgTuple,0> {
        static const byte_t* read(const byte_t* bin, MsgTuple&) {return bin;}
        static byte_t* write(byte_t* bin, const MsgTuple&)      {return bin;}
    };

    template< typename MsgTuple, unsigned int Size >
    struct msg_serializer {
        static const byte_t* read(const byte_t* bin, MsgTuple& msg)
        {
            return read_value( msg_serializer<MsgTuple,Size-1>::read(bin, msg)
                             , std::get<Size-1>(msg) );
        }
        static byte_t* write(byte_t* bin, const MsgTuple& msg)
        {
            return write_value( msg_serializer<MsgTuple,Size-1>::write(bin, msg)
                              , std::get<Size-1>(msg) );
        }
    };

    template< class MsgTuple >
    inline const byte_t* do_read_msg(const byte_t* bin, MsgTuple msg)
    {
        return msg_serializer<MsgTuple>::read(bin, msg);
    }

    template< class MsgTuple >
    inline byte_t* do_write_msg(byte_t* bin, const MsgTuple& msg)
    {
        return msg_serializer<MsgTuple>::write(bin, msg);
    }
}

template< class Msg >
inline const byte_t* read_msg(const byte_t* bin, Msg& msg)
{
    return detail::do_read_msg(bin, msg.get_tied_tuple());
}

template< class Msg >
inline const byte_t* write_msg(byte_t* bin, const Msg& msg)
{
    return detail::do_write_msg(bin, msg.get_tied_tuple());
}

int main()
{
    byte_t buffer[MsgData::buffer_size];

    std::cout << "buffer size is " << MsgData::buffer_size << '\n';

    MsgData msgData;
    std::cout << "initializing data...";
    msgData.num = 42;
    msgData.x = 1.7f;
    msgData.elevation = 17;
    std::cout << "data is now " << msgData << '\n';
    write_msg(buffer, msgData);

    std::cout << "clearing data...";
    msgData = MsgData();
    std::cout << "data is now " << msgData << '\n';

    std::cout << "reading data...";
    read_msg(buffer, msgData);
    std::cout << "data is now " << msgData << '\n';

    return 0;
}

For me this prints

buffer size is 6
initializing data...data is now [0x2a 1.7 0x11]
clearing data...data is now [0x0 0 0x0]
reading data...data is now [0x2a 1.7 0x11]

(I've shortened your MsgData type to only contain three data members, but this was just for testing.)

For each message type, you need to define its buffer_size static constant and two get_tied_tuple() member functions, one const and one non-const, both implemented in the same way. (Of course, these could just as well be non-members, but I tried to keep them close to the list of data members they are tied to.)
For some types (like std::string) you will need to add special overloads of those detail::read_value() and detail::write_value() functions.
The rest of the machinery stays the same for all message types.

With full C++1x support you might be able to get rid of having to fully type out the explicit return types of the get_tied_tuple() member functions, but I haven't actually tried this.

like image 176
sbi Avatar answered Sep 29 '22 17:09

sbi


My solution for parsing binary input is to use a Reader class, so per message entry you can define what is read and the reader can check for overruns,underruns,... .

In you case:

msg.num = Reader.getChar();
msg.x = Reader.getFloat();
msg.y = Reader.getFloat();
msg.z = Reader.getFloat();
msg.elevation = Reader.getChar();

It still is a lot of work and error-prone, but at least it helps checking for errors.

like image 33
stefaanv Avatar answered Sep 29 '22 19:09

stefaanv


Simple answer is no, if the message is a specific binary format that cannot be simply casted, you have no choice but to write a parser for it. If you have the message descriptions (say xml or some form of easily parsed description), why don't you generate the parsing code automatically from that description? It won't be as fast as a cast, but will be damn sight faster generating than writing each message by hand...

like image 27
Nim Avatar answered Sep 29 '22 19:09

Nim


I don't think you can avoid writing specicfic parsing routine for every message in pure C++ (without using pragma).

If all your messages are simple, POD, C-like structures, I think the easiest solution would be to write a code generator: put your structs in a header without other C++ stuff and write a simple parser (a perl/python/bash script using a couple of regular expressions should be enough) -or look for one- that is able to find the variable names in any message; then use it to automatically generate some code for any message to read it, like this:

YourStreamType & operator>>( YourStreamType &stream, MsgData &msg ) {
    stream >> msg.num >> msg.x >> msg.y >> msg.z >> msg.elevation;
    return stream;
}

specialize YourStreamType's operator>> for any basic type your messages contain and you should be done:

MsgData msg;
your_stream >> msg;
like image 41
peoro Avatar answered Sep 29 '22 18:09

peoro