Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make boost::serialization deserialization faster?

I use boost::serialization to save an object that contains this data :

struct Container
{
    struct SmallObject
    {
        struct CustomData
        {
            unsigned first;
            float second;
        };

        std::vector<CustomData> customData; // <- i can have 1 to 4 of these in the std::vector
        float data1[3];
        float data2[3];
        float data3[2];
        float data4[4];
    };

    std::vector<SmallObject> mySmallerObjects;  // <- i can have 8000 to 13000 of the std::vector
};

The serialization code looks like this (this in the intrusive version, I didn't write the functions declaration above for readability purposes) :

template<class Archive> void Container::SmallObject::CustomData::serialize(Archive& ar, unsigned /*version*/)
{
    ar & first;
    ar & second;
}

template<class Archive> void Container::SmallObject::serialize(Archive& ar, unsigned /*version*/)
{
    ar & customData;
    ar & data1
    ar & data2;
    ar & data3;
    ar & data4;
}

template<class Archive> void Container::serialize(Archive& ar, unsigned /*version*/)
{
    ar & mySmallerObjects;
}

I use binary_archives. In release mode, loading my container (with 12000 small objects) takes about 400 milliseconds. I am told this is too long. Are there any settings or different memory layouts that would speed up the loading process ? Shall I giveup using boost::serialization ?

like image 290
wip Avatar asked Jun 24 '11 01:06

wip


2 Answers

If I had to pick the single biggest drawback of Boost.Serialization, it would be poor performance. If 400ms is truly too slow, either get faster hardware or switch to a different serialization library.

That said, just in case you're doing something blatantly "wrong", you should post the serialization code for Container, Container::SmallObject, and Container::SmallObject::CustomData. You should also ensure that it's actually deserialization that's taking 400ms, and not a combination of deserializing + reading the data from the disk; i.e., load the data into a memory-stream of some sort and deserialize from that, rather than deserializing from an std::fstream.


EDIT (in response to comments):

This code works for me using VC++ 2010 SP1 and Boost 1.47 beta:

double loadArchive(std::string const& archiveFileName, Container& data)
{
    std::ifstream fileStream(
        archiveFileName.c_str(),
        std::ios_base::binary | std::ios_base::in
    );
    std::stringstream buf(
        std::ios_base::binary | std::ios_base::in | std::ios_base::out
    );
    buf << fileStream.rdbuf();
    fileStream.close();

    StartCounter();
    boost::archive::binary_iarchive(buf) >> data;
    return GetCounter();
}

If this doesn't work for you, it must be specific to the compiler and/or version of Boost you're using (which are what?).

On my machine, for an x86 release build (with link-time code generation enabled), loading the data from disk is ~9% of the overall time taken to deserialize a 1.28MB file (1 Container containing 13000 SmallObject instances, each containing 4 CustomData instances); for an x64 release build, loading the data from disk is ~17% of the overall time taken to deserialize a 1.53MB file (same object counts).

like image 106
ildjarn Avatar answered Nov 15 '22 12:11

ildjarn


I'd suggest writing the number of items into the serialization stream and then using std::vector::reserve to allocate all the memory you will need. That way, you will be doing the minimum number of allocations.

like image 42
Nicol Bolas Avatar answered Nov 15 '22 10:11

Nicol Bolas