Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

boost::serialization high memory consumption during serialization


just as the topic suggests I've come across a slight issue with boost::serialization when serializing a huge amount of data to a file. The problem consists of the memory footprint of the serialization part of the application taking around 3 to 3.5 times the memory of my objects being serialized.
It is important to note that the data structure I have is a three dimensional vector of base class pointers and a pointer to that structure. Like this:

using namespace std;    
vector<vector<vector<MyBase*> > >* data;

This is later serialised with a code analog to this one:

ar & BOOST_SERIALIZATION_NVP(data);

boost/serialization/vector.hpp is included.

Classes being serialised all inherit from "MyBase".
Now, since the start of my project I've used different archives for serialization from typical binary_archive, text, xml and finally polymorphic binary/xml/text. Every single one of these acts exactly the same way.

Typically this wouldn't be a problem if I had to serialize small amounts of data but the number of classes I have are in the milions (ideally around 10 milion) and the memory usage as I've been able to test it shows consistently that the memory allocated by boost::serialization part of the code is around 2/3 of the application whole memory footprint while writing the file.

This amounts to around 13.5 GB of RAM taken for 4 milion objects where the objects themselves take 4.2GB. Now this is as far as I've been able to take my code since I don't have access to a machine with more than 8GB of physical RAM. I should also note that this is a 64bit application being run on a Windows 7 professional x64 edition but the situation is similar on an Ubuntu box.

Anyone has any idea how I would go about troubleshooting this as it is unacceptable for me to have such high memory requirements for an application that will not use as much memory while running as it does while serializing.

Deserialization isn't as bad, as it allocates around 1.5 times the needed memory. This is something I could live with.

Tried turning tracking off with boost::archive::archive_flags::no_tracking but it acts exactly the same.

Anyone have any idea what I should do?

like image 328
Max021 Avatar asked Dec 29 '10 22:12

Max021


1 Answers

Using valgrind I found that the main reason of memory consumption is a map inside the library to track pointers. If you are certain that you do not need pointer tracking ( it means you are sure that there is no pointer aliasing) disable tracking. You can find here the main concepts of disable tracking. In short you must do something like this:

BOOST_CLASS_TRACKING(vector<vector<vector<MyBase*> > >, boost::serialization::track_never)

In my question I wrote a version of this macro that you could disable tracking of a template class. This must have a significant impact on your memory consumption. Also notice that there are pointers inside any containers If you want tracking never you must disable tracking of them too. Currently I could not find any way to do this properly.

like image 181
motam Avatar answered Oct 18 '22 10:10

motam