Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handle removed variable from boost serialize

I looked at the example on the web about adding a member variable to the serialization function by incrementing the version number and adding an "if" around the serialization of that variable.

But what do I have to do if I remove a member variable? Should I just remove it from the serialization function and boost will take care of it?

This can get worst if I remove some classes that were "serialize" in the serialize function, do I need to keep them just for that serialization code or there is another way?

like image 359
Vincent Avatar asked May 21 '15 16:05

Vincent


1 Answers

Background / Archive format compatibility

Boost Serialization is pretty lightweight in lots of scenarios.

Specifically if you don't employ object tracking/dynamic polymorphism there's a surprising amount of leeway that renders your serialization streams compatible.

Both tracking and polymorphism become a factor when serializing through (smart) pointers (to base).

Most things in the standard library, as well as in modern C++, favour value-semantics (e.g. all standard containers) and by immediate implication, play well here.

As a specific example, I've had lots of success serializing

std::map<std::string, boost::uuids::uuid>

into a binary archive, and de-serializing this archive as

boost::unordered_map<std::string, boost::uuids::uuid>
// or
boost::flat_map<std::string, boost::uuids::uuid>
// or
std::vector<std::pair<std::string, boost::uuids::uuid> >

None of these types (need to) store type information, so the binary streams are compatible exchangeable.

Perhaps if you want to rely on this kind of "incidental" compatibility, you may want to write extensive tests.

I have a feeling you should be able to devise a trivial archive implementation that, instead of serializing actual data, creates a "layout map" or "compatibility signature" of the data-structures involved.

This could go a long way to gaining the confidence to verify archive-compatibility between distinct types

Case Study 1: changed layout

This closely matches the original question: "How do I de-serialize old versions once a field has been removed".

Here, the key is: serialize is just a function. You can do whatever you need. Take a simple demo class that went through two versions:

struct MyType {
    MyType();
    MyType(std::string const& v);

  private:
    friend class boost::serialization::access;
    template <typename Ar> void serialize(Ar&, unsigned);

#if DEMO_VERSION == 0

    bool hasValue;
    std::string value;

#elif DEMO_VERSION == 1

    boost::optional<std::string> value;

#endif
};

Obviously, there'll be different implementations for the versions.

The trick is to de-serialize to temporary variables, and then map the old semantics on the new semantics according to your business rules:

#if DEMO_VERSION == 0
MyType::MyType()                     : hasValue(false)          {}
MyType::MyType(std::string const &v) : hasValue(true), value(v) {}

template <typename Ar> void MyType::serialize(Ar& ar, unsigned /*file_version*/) {
    ar & hasValue & value; // life was simple in v0
}

#elif DEMO_VERSION == 1
MyType::MyType()                     : value(boost::none)       {}
MyType::MyType(std::string const &v) : value(v)                 {}

template <typename Ar> void MyType::serialize(Ar& ar, unsigned file_version) {
    switch (file_version) {
        case 0: {
            assert(Ar::is_loading::value); // should not be writing old formats
            //
            bool        old_hasValue;      // these fields no longer exist
            std::string oldValue;

            ar & old_hasValue & oldValue;

            // translate to new object semantics/layout
            value.reset();
            if (old_hasValue) value.reset(oldValue);

            break;
        }
        default: // v1+
            ar & value;
    }
}
#endif

You can see this process live on Coliru where program v0 writes an object to v0.dat, which program v1 successfully reads (and serializes in the new format):

Live On Coliru

BOOST_CLASS_VERSION(MyType, DEMO_VERSION)
#include <fstream>

namespace demo {
    template <typename T> void serialize(std::ostream& os, T const& obj) {
        {
            boost::archive::text_oarchive oa(os);
            oa << obj;
        }
        os.flush();
    }

    template <typename T> void save(std::string const& fname, T const& payload) {
        std::ofstream ofs(fname, std::ios::binary);
        serialize(ofs, payload);
    }

    MyType load(std::string const& fname) {
        std::ifstream ifs(fname, std::ios::binary);

        MyType obj;

        boost::archive::text_iarchive ia(ifs);
        ia >> obj;

        return obj;
    }
}

int main(int, char** cmd) {
    std::cout << "Running " << *cmd << " with DEMO_VERSION=" << DEMO_VERSION << "\n";
    using namespace demo;

#if DEMO_VERSION == 0

    MyType payload("Forty two");
    save     ("v0.dat", payload);  // uses v0 format
    serialize(std::cout, payload); // uses v0 format

#elif DEMO_VERSION == 1

    auto loaded = load("v0.dat");  // still reads the v0 format
    serialize(std::cout, loaded);  // uses v1 format now

#endif
}

Prints:

for v in 0 1
do
    g++ -std=c++11 -Os -Wall -DDEMO_VERSION=$v main.cpp -o v$v -lboost_system -lboost_serialization
    ./v$v
done
Running ./v0 with DEMO_VERSION=0
22 serialization::archive 11 0 0 1 9 Forty two
Running ./v1 with DEMO_VERSION=1
22 serialization::archive 11 0 1 0 0 1 0 9 Forty two

Case Study 2: changed/removed types

Like you said, probably the easiest thing to do would be to keep the old type for indirect de-serialization.

Referring to the section "Background / Archive format compatibility" above, there's another option as long as you know what you're doing, of course.

Let's assume that the above sample ("Case Study 1") was slightly different, and used a PoorMansOptional<std::string> that got replaces by a boost::optional<std::string>. You could figure out the equivalent fields to de-serialize.

Take note of the extra item version fields that might be interspersed. Such fields are conveniently absent between items in the container examples mentioned above.

like image 61
sehe Avatar answered Sep 29 '22 19:09

sehe