Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce boilerplate currently necessary for serialization

Our software is abstracting away hardware, and we have classes that represent this hardware's state and have lots of data members for all properties of that external hardware. We need to regularly update other components about that state, and for that we send protobuf-encoded messages via MQTT and other messaging protocols. There are different messages that describe different aspects of the hardware, so we need to send different views of the data of those classes. Here's a sketch:

struct some_data {
  Foo foo;
  Bar bar;
  Baz baz;
  Fbr fbr;
  // ...
};

Let's assume we need to send one message containing foo and bar, and one containing bar and baz. Our current way of doing this is a lot of boiler-plate:

struct foobar {
  Foo foo;
  Bar bar;
  foobar(const Foo& foo, const Bar& bar) : foo(foo), bar(bar) {}
  bool operator==(const foobar& rhs) const {return foo == rhs.foo && bar == rhs.bar;}
  bool operator!=(const foobar& rhs) const {return !operator==(*this,rhs);}
};

struct barbaz {
  Bar bar;
  Baz baz;
  foobar(const Bar& bar, const Baz& baz) : bar(bar), baz(baz) {}
  bool operator==(const barbaz& rhs) const {return bar == rhs.bar && baz == rhs.baz;}
  bool operator!=(const barbaz& rhs) const {return !operator==(*this,rhs);}
};

template<> struct serialization_traits<foobar> {
  static SerializedFooBar encode(const foobar& fb) {
    SerializedFooBar sfb;
    sfb.set_foo(fb.foo);
    sfb.set_bar(fb.bar);
    return sfb;
  }
};

template<> struct serialization_traits<barbaz> {
  static SerializedBarBaz encode(const barbaz& bb) {
    SerializedBarBaz sbb;
    sfb.set_bar(bb.bar);
    sfb.set_baz(bb.baz);
    return sbb;
  }
};

This can then be sent:

void send(const some_data& data) {
  send_msg( serialization_traits<foobar>::encode(foobar(data.foo, data.bar)) );
  send_msg( serialization_traits<barbaz>::encode(barbaz(data.foo, data.bar)) );
}

Given that the data sets to be sent are often much larger than two items, that we need to decode that data, too, and that we have tons of these messages, there is a lot more boilerplate involved than what's in this sketch. So I have been searching for a way to reduce this. Here's a first idea:

typedef std::tuple< Foo /* 0 foo */
                  , Bar /* 1 bar */
                  > foobar;
typedef std::tuple< Bar /* 0 bar */
                  , Baz /* 1 baz */
                  > barbaz;
// yay, we get comparison for free!

template<>
struct serialization_traits<foobar> {
  static SerializedFooBar encode(const foobar& fb) {
    SerializedFooBar sfb;
    sfb.set_foo(std::get<0>(fb));
    sfb.set_bar(std::get<1>(fb));
    return sfb;
  }
};

template<>
struct serialization_traits<barbaz> {
  static SerializedBarBaz encode(const barbaz& bb) {
    SerializedBarBaz sbb;
    sfb.set_bar(std::get<0>(bb));
    sfb.set_baz(std::get<1>(bb));
    return sbb;
  }
};

void send(const some_data& data) {
  send_msg( serialization_traits<foobar>::encode(std::tie(data.foo, data.bar)) );
  send_msg( serialization_traits<barbaz>::encode(std::tie(data.bar, data.baz)) );
}

I got this working, and it cuts the boilerplate considerably. (Not in this small example, but if you imagine a dozen data points being encoded and decoded, a lot of the repeating listings of data members disappearing makes a lot of difference). However, this has two disadvantages:

  1. This relies on Foo, Bar, and Baz being distinct types. If they are all int, we need to add a dummy tag type to the tuple.

    This can be done, but it does make this whole idea considerably less appealing.

  2. What's variable names in the old code becomes comments and numbers in the new code. That's pretty bad, and given that it is likely that a bug confusing two members is likely present in the encoding as well as in the decoding, it can't be caught in simple unit tests, but needs test components created through other technologies (so integration tests) for catching such bugs.

    I have no idea how to fix this.

Has anybody a better idea how to reduce the boilerplate for us?

Note:

  • For the time being, we're stuck with C++03. Yes, you read that right. For us, it's std::tr1::tuple. No lambda. And no auto either.
  • We have a tons of code employing those serialization traits. We cannot throw away the whole scheme and do something completely different. I am looking for a solution to simplify future code fitting into the existing framework. Any idea that requires us to re-write the whole thing will very likely be dismissed.
like image 554
sbi Avatar asked May 14 '18 20:05

sbi


2 Answers

In my opinion, the best all-around solution is an external C++ code generator in a scripting language. It has the following advantages:

  • Flexibility: it allows you to change the generated code at any time. This is extremely good for several sub-reasons:

    • Readily fix bugs in all old supported releases.
    • Use new C++ features if you move to C++11 or later in the future.
    • Generate code for a different language. This is very, very useful (specially if your organization is big and/or you have many users). For instance, you could output a small scripting library (e.g. Python module) that can be used as a CLI tool to interface with the hardware. In my experience, this was very liked by hardware engineers.
    • Generate GUI code (or GUI descriptions, e.g. in XML/JSON; or even a web interface) -- useful for people using the final hardware and testers.
    • Generation of other kind of data. For instance, diagrams, statistics, etc. Or even the protobuf descriptions themselves.
  • Maintenance: it will be easier to maintain than in C++. Even if it is written in a different language, it is typically easier to learn that language than have a new C++ developer dive into C++ template metaprogramming (specially in C++03).

  • Performance: it can easily reduce the compilation time of the C++ side (since you can output very simple C++ -- even plain C). Of course, the generator may offset this advantage. In your case, this may not apply, since it looks like you cannot change the client code.

I have used that approach in a couple of projects/systems and it turned out quite nicely. Specially the different alternatives for using the hardware (C++ lib, Python lib, CLI, GUI...) can be very appreciated.


Side note: if part of the generation requires parsing already existing C++ code (e.g. headers with data types to be serialized, like in OP's case with the Serialized types); then a very nice solution is using LLVM/clang's tooling to do so.

In a particular project I worked on, we had to serialize dozens of C++ types automatically (that were subject to change at any time by users). We managed to generate automatically the code for it by just using the clang Python bindings and integrate it in the build process. While the Python bindings did not expose all the AST details (at the time, at least), they were enough for generating the required serialization code for all our types (which included templated classes, containers, etc.).

like image 111
Acorn Avatar answered Nov 07 '22 00:11

Acorn


I will build on your proposed solution, but use boost::fusion::tuples instead (assuming that is allowed). Let's assume your data types are

struct Foo{};
struct Bar{};
struct Baz{};
struct Fbr{};

and your data is

struct some_data {
    Foo foo;
    Bar bar;
    Baz baz;
    Fbr fbr;
};

From the comments, I understand that you have no control over the SerialisedXYZ classes but they do have a certain interface. I will assume that something like this is close enough(?):

struct SerializedFooBar {

    void set_foo(const Foo&){
        std::cout << "set_foo in SerializedFooBar" << std::endl;
    }

    void set_bar(const Bar&){
        std::cout << "set_bar in SerializedFooBar" << std::endl;
    }
};

// another protobuf-generated class
struct SerializedBarBaz {

    void set_bar(const Bar&){
        std::cout << "set_bar in SerializedBarBaz" << std::endl;
    }

    void set_baz(const Baz&){
        std::cout << "set_baz in SerializedBarBaz" << std::endl;
    }
};

We can now reduce the boilerplate and limit it to one typedef per datatype-permutation and one simple overload for each set_XXX member of the SerializedXYZ class, as follows:

typedef boost::fusion::tuple<Foo, Bar> foobar;
typedef boost::fusion::tuple<Bar, Baz> barbaz;
//...

template <class S>
void serialized_set(S& s, const Foo& v) {
    s.set_foo(v);
}

template <class S>
void serialized_set(S& s, const Bar& v) {
    s.set_bar(v);
}

template <class S>
void serialized_set(S& s, const Baz& v) {
    s.set_baz(v);
}

template <class S, class V>
void serialized_set(S& s, const Fbr& v) {
    s.set_fbr(v);
}
//...

The good thing now is that you do not need to specialise your serialization_traits anymore. The following makes use of the boost::fusion::fold function, which I assume is OK to use in your project:

template <class SerializedX>
class serialization_traits {

    struct set_functor {

        template <class V>
        SerializedX& operator()(SerializedX& s, const V& v) const {
            serialized_set(s, v);
            return s;
        }
    };

public:

    template <class Tuple>
    static SerializedX encode(const Tuple& t) {
        SerializedX s;
        boost::fusion::fold(t, s, set_functor());
        return s;
    }
};

And here are some examples of how it works. Notice that if someone tries to tie a data member from some_data that is not compliant with the SerializedXYZ interface, the compiler will inform you about it:

void send_msg(const SerializedFooBar&){
    std::cout << "Sent SerializedFooBar" << std::endl;
}

void send_msg(const SerializedBarBaz&){
    std::cout << "Sent SerializedBarBaz" << std::endl;
}

void send(const some_data& data) {
  send_msg( serialization_traits<SerializedFooBar>::encode(boost::fusion::tie(data.foo, data.bar)) );
  send_msg( serialization_traits<SerializedBarBaz>::encode(boost::fusion::tie(data.bar, data.baz)) );
//  send_msg( serialization_traits<SerializedFooBar>::encode(boost::fusion::tie(data.foo, data.baz)) ); // compiler error; SerializedFooBar has no set_baz member
}

int main() {

    some_data my_data;
    send(my_data);
}

Code here

EDIT:

Unfortunately, this solution does not tackle problem #1 of the OP. To remedy this, we can define a series of tags, one for each of your data members and follow a similar approach. Here are the tags, along with the modified serialized_set functions:

struct foo_tag{};
struct bar1_tag{};
struct bar2_tag{};
struct baz_tag{};
struct fbr_tag{};

template <class S>
void serialized_set(S& s, const some_data& data, foo_tag) {
    s.set_foo(data.foo);
}

template <class S>
void serialized_set(S& s, const some_data& data, bar1_tag) {
    s.set_bar1(data.bar1);
}

template <class S>
void serialized_set(S& s, const some_data& data, bar2_tag) {
    s.set_bar2(data.bar2);
}

template <class S>
void serialized_set(S& s, const some_data& data, baz_tag) {
    s.set_baz(data.baz);
}

template <class S>
void serialized_set(S& s, const some_data& data, fbr_tag) {
    s.set_fbr(data.fbr);
}

The boilerplate is again limited to one serialized_set per data member and scales linearly, similarly to my previous answer. Here is the modified serialization_traits:

// the serialization_traits doesn't need specialization anymore :)
template <class SerializedX>
class serialization_traits {

    class set_functor {

        const some_data& m_data;

    public:

        typedef SerializedX& result_type;

        set_functor(const some_data& data)
        : m_data(data){}

        template <class Tag>
        SerializedX& operator()(SerializedX& s, Tag tag) const {
            serialized_set(s, m_data, tag);
            return s;
        }
    };

public:

    template <class Tuple>
    static SerializedX encode(const some_data& data, const Tuple& t) {
        SerializedX s;
        boost::fusion::fold(t, s, set_functor(data));
        return s;
    }
};

and here is how it works:

void send(const some_data& data) {

    send_msg( serialization_traits<SerializedFooBar>::encode(data,
    boost::fusion::make_tuple(foo_tag(), bar1_tag())));

    send_msg( serialization_traits<SerializedBarBaz>::encode(data,
    boost::fusion::make_tuple(baz_tag(), bar1_tag(), bar2_tag())));
}

Updated code here

like image 7
linuxfever Avatar answered Nov 07 '22 00:11

linuxfever