Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Derived class serialization without class tracking in Boost (C++)

I have some problems with boost serialization when serializing derived class through base class pointer. I need a system which serializes some objects as they are being received in the system, so I need to serialize over time. This is not really a problem since I can open a boost::archive::binary_oarchive and serialize objects when required. Rapidly I noticed that boost was performing object tracking by memory address, so the first problem was that different objects in time that share the same memory address were saved as the same object. This can be fixed by using the following macro in the required derived class:

BOOST_CLASS_TRACKING(className, boost::serialization::track_never)

This works fine, but again, when the base class is not abstract, the base class is not serialized properly. In the following example, the base class serialization method is only called once with the first object. In the following, boost assumes that this object has been serialized before although the object has different type.

#include <iostream>
#include <fstream>
#include <boost/serialization/export.hpp>
#include <boost/serialization/base_object.hpp>
#include <boost/serialization/list.hpp>
#include <boost/serialization/map.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/serialization/shared_ptr.hpp>
#include <boost/archive/archive_exception.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>

using namespace std;

class AClass{
public:
    AClass(){}
    virtual ~AClass(){}
private:
    double a;
    double b;
    //virtual void virtualMethod() = 0;
private:
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & a;
        ar & b;
        cout << "A" << endl;
    }
};
//BOOST_SERIALIZATION_ASSUME_ABSTRACT(Aclass)
//BOOST_CLASS_TRACKING(AClass, boost::serialization::track_never)

class BClass : public AClass{
public:
    BClass(){}
    virtual ~BClass(){}
private:
    double c;
    double d;
    virtual void virtualMethod(){};
private:
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & boost::serialization::base_object<AClass>(*this);
        ar & c;
        ar & d;
        cout << "B" << endl;
    }
};
// define export to be able to serialize through base class pointer
BOOST_CLASS_EXPORT(BClass)
BOOST_CLASS_TRACKING(BClass, boost::serialization::track_never)


class CClass : public AClass{
public:
    CClass(){}
    virtual ~CClass(){}
private:
    double c;
    double d;
    virtual void virtualMethod(){};
private:
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & boost::serialization::base_object<AClass>(*this);
        ar & c;
        ar & d;
        cout << "C" << endl;
    }
};
// define export to be able to serialize through base class pointer
BOOST_CLASS_EXPORT(CClass)
BOOST_CLASS_TRACKING(CClass, boost::serialization::track_never)

int main() {
    cout << "Serializing...." << endl;
    {
        ofstream ofs("serialization.dat");
        boost::archive::binary_oarchive oa(ofs);
        for(int i=0;i<5;i++)
        {
            AClass* baseClassPointer = new BClass();
            // serialize object through base pointer
            oa << baseClassPointer;
            // free the pointer so next allocation can reuse memory address
            delete baseClassPointer;
        }

        for(int i=0;i<5;i++)
        {
            AClass* baseClassPointer = new CClass();
            // serialize object through base pointer
            oa << baseClassPointer;
            // free the pointer so next allocation can reuse memory address
            delete baseClassPointer;
        }
    }
    getchar();
    cout << "Deserializing..." << endl;
    {
        ifstream ifs("serialization.dat");
        boost::archive::binary_iarchive ia(ifs);
        try{
            while(true){
                AClass* a;
                ia >> a;
                delete a;
            }
        }catch(boost::archive::archive_exception const& e)
        {

        }
    }
    return 0;
}

When executing this piece of code, the result is as follow:

Serializing....
A
B
B
B
B
B
C
C
C
C
C

Deserializing...
A
B
B
B
B
B
C
C
C
C
C

So the base class is only being serialized once, although the derived class has explicitly the track_never flag. There are two different workarounds to fix this behaviour. The first one is to make the base class abstract with a pure virtual method and calling the macro BOOST_SERIALIZATION_ASSUME_ABSTRACT(Aclass), and the second one is to put the track_never flag also in the base class (commented in code).

None of these solutions meets my requirements, since I want to do in the future punctual serializations of the system state, which would require tracking features for a given DClass extending A (not B or C), and also the AClass should not be abstract.

Any hints? Is there any way to call explicitly the base class serialization method avoiding the tracking feature in the base class (that already has been disabled in the derived class)?

like image 792
Alvaro Luis Bustamante Avatar asked Mar 25 '13 11:03

Alvaro Luis Bustamante


2 Answers

After having a little closer look to boost::serialization I'm also convinced there is no straightforward solution for you request. As you already mentioned the tracking behavior for the serialization is declared on a class by class base with BOOST_CLASS_TRACKING. This const global information is than interpret in the virtual method tracking from class oserializer.

   virtual bool tracking(const unsigned int /* flags */)

Because this is a template class you can explicitly instantiate this method for your classes.

namespace boost {
namespace archive {
namespace detail {

template<>
    virtual bool oserializer<class binary_oarchive, class AClass >::tracking(const unsigned int f /* flags */) const {
        return do_your_own_tracking_decision();
    }

}}}

Now you can try to e.g have something like a global variable and change the tracking behavior from time to time. (E.g depending on which derivate class is written to the archive.) This seems to wok for “Serializing“ but the “Deserializing“ than throw an exception. The reason for this is, that the state of “tracking” for each class is only written ones to the archive. Therefore the deserialize does always expect the data for AClass if BClass or CClass is read (at leased if the first write attempt for AClass was with tracking disabled).

One possible solution could be to use the flags parameter in tracking() method. This parameter represent the flags the archive is created with, default “0”.

binary_oarchive(std::ostream & os, unsigned int flags = 0) 

The archive flags are declared in basic_archive.hpp

enum archive_flags {
    no_header = 1,  // suppress archive header info
    no_codecvt = 2,  // suppress alteration of codecvt facet
    no_xml_tag_checking = 4,   // suppress checking of xml tags
    no_tracking = 8,           // suppress ALL tracking
    flags_last = 8
};

no_tracking seems currently not to be supported, but you can now add this behavior to tracking.

template<>
    virtual bool oserializer<class binary_oarchive, class AClass >::tracking(const unsigned int f /* flags */) const {
        return !(f & no_tracking);
    } 

Now you can at leased decide for different archives whether AClass should be tracked or not.

 boost::archive::binary_oarchive oa_nt(ofs, boost::archive::archive_flags::no_tracking);

And this are the changes in your example.

int main() {
    cout << "Serializing...." << endl;
    {
        ofstream ofs("serialization1.dat");
        boost::archive::binary_oarchive oa_nt(ofs, boost::archive::archive_flags::no_tracking);
        //boost::archive::binary_oarchive oa(ofs);
        for(int i=0;i<5;i++)
        {
            AClass* baseClassPointer = new BClass();
            // serialize object through base pointer
            oa_nt << baseClassPointer;
            // free the pointer so next allocation can reuse memory address
            delete baseClassPointer;
        }

        ofstream ofs2("serialization2.dat");
        boost::archive::binary_oarchive oa(ofs2);
        //boost::archive::binary_oarchive oa(ofs);

        for(int i=0;i<5;i++)
        {
            AClass* baseClassPointer = new CClass();
            // serialize object through base pointer
            oa << baseClassPointer;
            // free the pointer so next allocation can reuse memory address
            delete baseClassPointer;
        }
    }
    getchar();
    cout << "Deserializing..." << endl;
    {
        ifstream ifs("serialization1.dat");
        boost::archive::binary_iarchive ia(ifs);
        try{
            while(true){
                AClass* a;
                ia >> a;
                delete a;
            }
        }catch(boost::archive::archive_exception const& e)
        {

        }

        ifstream ifs2("serialization2.dat");
        boost::archive::binary_iarchive ia2(ifs2);
        try{
            while(true){
                AClass* a;
                ia2 >> a;
                delete a;
            }
        }catch(boost::archive::archive_exception const& e)
        {

        }

    }
    return 0;
}


namespace boost {
namespace archive {
namespace detail {

template<>
    virtual bool oserializer<class binary_oarchive, class AClass >::tracking(const unsigned int f /* flags */) const {
        return !(f & no_tracking);
    }

}}}

This still may not be what you are looking for. There are lot more methods which could be adapted with an own implementation. Or your have to derivate your own archive class.

like image 154
hr_117 Avatar answered Nov 10 '22 02:11

hr_117


Ultimately the problem seems to be that a boost::serialization archive represents state at a single point in time, and you want your archive to contain state that has changed, i.e. pointers that have been reused. I don't think there is a simple boost::serialization flag that induces the behavior you want.

However, I think there are other workarounds that might be sufficient. You can encapsulate the serialization for a class into its own archive, and then archive the encapsulation. That is, you can implement the serialization for B like this (note that you have to split serialize() into save() and load()):

// #include <boost/serialization/split_member.hpp>
// #include <boost/serialization/string.hpp>
// Replace serialize() member function with this.

template<class Archive>
void save(Archive& ar, const unsigned int version) const {
  // Serialize instance to a string (or other container).
  // std::stringstream used here for simplicity.  You can avoid
  // some buffer copying with alternative stream classes that
  // directly access an external container or iterator range.
  std::ostringstream os;
  boost::archive::binary_oarchive oa(os);
  oa << boost::serialization::base_object<AClass>(*this);
  oa << c;
  oa << d;

  // Archive string to top level.
  const std::string s = os.str();
  ar & s;
  cout << "B" << endl;
}

template<class Archive>
void load(Archive& ar, const unsigned int version) {
  // Unarchive string from top level.
  std::string s;
  ar & s;

  // Deserialize instance from string.
  std::istringstream is(s);
  boost::archive::binary_iarchive ia(is);
  ia >> boost::serialization::base_object<AClass>(*this);
  ia >> c;
  ia >> d;
  cout << "B" << endl;
}

BOOST_SERIALIZATION_SPLIT_MEMBER()

Because each instance of B is serialized into its own archive, A is effectively not tracked because there is only one reference per archive of B. This produces:

Serializing....
A
B
A
B
A
B
A
B
A
B
A
C
C
C
C
C

Deserializing...
A
B
A
B
A
B
A
B
A
B
A
C
C
C
C
C

A potential objection to this technique is the storage overhead of encapsulation. The result of the original test program are 319 bytes while the modified test program produces 664 bytes. However, if gzip is applied to both output files then the sizes are 113 bytes for the original and 116 bytes for the modification. If space is a concern then I would recommend adding compression to the outer serialization, which can be easily done with boost::iostreams.

Another possible workaround is to extend the life of instances to the lifespan of the archive so pointers are not reused. You could do this by associating a container of shared_ptr instances to your archive, or by allocating instances from a memory pool.

like image 2
rhashimoto Avatar answered Nov 10 '22 03:11

rhashimoto