Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good Design for C++ Serialization

i^m currenty searching for a good OO design to serialize a C++/Qt Application.
Imagine the classes of the application organized based on a tree structure, implemented with the Composite-Pattern, like in the following picture.

The two possible principles i thought of:

1.)
Put the save()/load() function in every class which has to be serializable. If have seen this many times, usually implemented with boost. Somewhere in the class you will find something like this:

friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
    ar & m_meber1;
}

You could also seperate this into save() and load(). But the disadvantage of this approach is :
If you wannt to change the serialization two months later (to XML, HTML or something very curious, which boost not supports) you have to adopt all the thousands of classes. Which in my opinion is not a good OO-Design.
And if you wannt to support different serializations (XML, Binary, ASCII, whatever....) at the same time than 80% of the cpp exists just for serialization functions.

2.)
I know boost also provides a Non intrusive Version of the Serialization

"http://www.boost.org/doc/libs/1_49_0/libs/serialization/doc/tutorial.html"

So another way is to implement an Iterator which iterates over the composite tree structure and serializes every object (and one iterator for the deserialization)
(I think this is what the XmlSerializer-Class of the .NET-Framework does, but i^m not realy familiar with .NET)
This sounds better because seperate save() and load() and only one spot to change if serialization changes.
So this sounds better, BUT:
- You have to provide a setter() and a getter() for every parameter you wannt to serialize. (So, there is no private Data anymore. (Is this good/bad?))
- You could have a long inheritance hirarchy (more than 5 Classes) hanging on the composite tree.
So how do you call the setter()/getter() of the derived classes? When you can only call a interface function of the base Composite-Component.

Another way is to serialize the objects data into a seperate abstract format. From which all the possible following serializations (XML, TEXT, whatever is possible) get their data. One Idea was to serialize it to QDomNode. But i think the extra abstraction will cost performance.

So my question is:
Does anyone know a good OO-Design for serialization?
Maybe from other programming languages like JAVA, Python, C#, whatever....

Thank you.

like image 509
grimblegrumble Avatar asked Oct 08 '22 15:10

grimblegrumble


1 Answers

Beware of serialization.

Serialization is about taking a snapshot of your in-memory representation and restoring it later on.

This is all great, except that it starts fraying at the seams when you think about loading a previously stored snapshot with a newer version of the software (Backward Compatibility) or (god forbid) a recently stored snapshot with an older version of the software (Forward Compatibility).

Many structures can easily deal with backward compatibility, however forward compatibility requires that your newer format is very close to its previous iteration: basically, just add/remove some fields but keeps the same overall structure.

The problem is that serialization, for performance reasons, tends to tie the on-disk structure to the in-memory representation; changing the in-memory representation then requires either the deprecation of the old archives (and/or a migration utility).

On the other hand, messaging systems (and this is what google protobuf is) are about decoupling the exchanged messages structures from the in-memory representation so that your application remains flexible.

Therefore, you first need to choose whether you will implement serialization or messaging.


Now you are right that you can either write the save/load code within the class or outside it. This is once again a trade-off:

  • in-class code has immediate access to all-members, usually more efficient and straightforward, but less flexible, so it goes hand in hand with serialization
  • out-of-class code requires indirect access (getters, visitors hierarchy), less efficient, but more flexible, so it goes hand in hand with messaging

Note that there is no drawback about hidden state. A class has no (truly) hidden state:

  • caches (mutable values) are just that, they can be lost without worry
  • hidden types (think FILE* or other handle) are normally recoverable through other ways (serializing the name of the file for example)
  • ...

Personally I use a mix of both.

  • Caches are written for the current version of the program and use fast (de)serialization in v1. New code is written to work with both v1 and v2, and writes in v1 by default until the previous version disappears; then switches to writing v2 (assuming it's easy). Occasionally, massive refactoring make backward compatibility too painful, we drop it on the floor at this point (and increment the major digit).
  • On the other hand, exchanges with other applications/services and more durable storage (blobs in database or in files) use messaging because I don't want to tie myself down to a particular code structure for the next 10 years.

Note: I am working on server applications, so my advices reflect the particulars of such an environment. I imagine client-side apps have to support old versions forever...

like image 132
Matthieu M. Avatar answered Oct 12 '22 12:10

Matthieu M.