Whenever I find myself needing to serialize objects in a C++ program, I fall back to this kind of pattern:
class Serializable { public: static Serializable *deserialize(istream &is) { int id; is >> id; switch(id) { case EXAMPLE_ID: return new ExampleClass(is); //... } } void serialize(ostream &os) { os << getClassID(); serializeMe(os); } protected: int getClassID()=0; void serializeMe(ostream &os)=0; };
The above works pretty well in practice. However, I've heard that this kind of switching over class IDs is evil and an antipattern; what's the standard, OO-way of handling serialization in C++?
For serializing the object, we call the writeObject() method of ObjectOutputStream class, and for deserialization we call the readObject() method of ObjectInputStream class. We must have to implement the Serializable interface for serializing the object.
Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.
Using something like Boost Serialization, while by no means a standard, is a (for the most part) very well written library that does the grunt work for you.
The last time I had to manually parse a predefined record structure with a clear inheritance tree, I ended up using the factory pattern with registrable classes (i.e. Using a map of key to a (template) creator function rather than a lot of switch functions) to try and avoid the issue you were having.
EDIT
A basic C++ implementation of a object factory mentioned in the above paragraph.
/** * A class for creating objects, with the type of object created based on a key * * @param K the key * @param T the super class that all created classes derive from */ template<typename K, typename T> class Factory { private: typedef T *(*CreateObjectFunc)(); /** * A map keys (K) to functions (CreateObjectFunc) * When creating a new type, we simply call the function with the required key */ std::map<K, CreateObjectFunc> mObjectCreator; /** * Pointers to this function are inserted into the map and called when creating objects * * @param S the type of class to create * @return a object with the type of S */ template<typename S> static T* createObject(){ return new S(); } public: /** * Registers a class to that it can be created via createObject() * * @param S the class to register, this must ve a subclass of T * @param id the id to associate with the class. This ID must be unique */ template<typename S> void registerClass(K id){ if (mObjectCreator.find(id) != mObjectCreator.end()){ //your error handling here } mObjectCreator.insert( std::make_pair<K,CreateObjectFunc>(id, &createObject<S> ) ); } /** * Returns true if a given key exists * * @param id the id to check exists * @return true if the id exists */ bool hasClass(K id){ return mObjectCreator.find(id) != mObjectCreator.end(); } /** * Creates an object based on an id. It will return null if the key doesn't exist * * @param id the id of the object to create * @return the new object or null if the object id doesn't exist */ T* createObject(K id){ //Don't use hasClass here as doing so would involve two lookups typename std::map<K, CreateObjectFunc>::iterator iter = mObjectCreator.find(id); if (iter == mObjectCreator.end()){ return NULL; } //calls the required createObject() function return ((*iter).second)(); } };
Serialization is a touchy topic in C++...
Quick question:
The 2 are useful, and have their use.
Boost.Serialization is the most recommended library for serialization usually, though the odd choice of operator&
which serializes or deserializes depending on the const-ness is really an abuse of operator overloading for me.
For messaging, I would rather suggest Google Protocol Buffer. They offer a clean syntax for describing the message and generate encoders and decoders for a huge variety of languages. There are also one other advantage when performance matters: it allows lazy deserialization (ie only part of the blob at once) by design.
Moving on
Now, as for the details of implementation, it really depends on what you wish.
tag
+ factory
. It's only necessary for polymorphic class. And you will need one factory
per inheritance tree (kind
) then... the code can be templatized of course!kind
is given an id
, unique for its kind
, and so I serialize the id
rather than a pointer. Some framework handles it as long as you don't have circular dependency and serialize the objects pointed to / referenced first.Personally, I tried as much as I can to separate the code of serialization / deserialization from the actual code that runs the class. Especially, I try to isolate it in the source files so that changes on this part of the code does not annihilate the binary compatibility.
On versioning
I usually try to keep serialization and deserialization of one version close together. It's easier to check that they are truly symmetric. I also try to abstract the versioning handling directly in my serialization framework + a few other things, because DRY should be adhered to :)
On error-handling
To ease error-detection, I usually use a pair of 'markers' (special bytes) to separate one object from another. It allows me to immediately throw during deserialization because I can detect a problem of desynchronization of the stream (ie, somewhat ate too much bytes or did not ate sufficiently).
If you want permissive deserialization, ie deserializing the rest of the stream even if something failed before, you'll have to move toward byte-count: each object is preceded by its byte-count and can only eat so much byte (and is expected to eat them all). This approach is nice because it allows for partial deserialization: ie you can save the part of the stream required for an object and only deserialize it if necessary.
Tagging (your class IDs) is useful here, not (only) for dispatching, but simply to check that you are actually deserializing the right type of object. It also allows for pretty error messages.
Here are some error messages / exceptions you may wish:
No version X for object TYPE: only Y and Z
Stream is corrupted: here are the next few bytes BBBBBBBBBBBBBBBBBBB
TYPE (version X) was not completely deserialized
Trying to deserialize a TYPE1 in TYPE2
Note that as far as I remember both Boost.Serialization
and protobuf
really help for error/version handling.
protobuf
has some perks too, because of its capacity of nesting messages:
The counterpart is that it's harder to handle polymorphism because of the fixed format of the message. You have to carefully design them for that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With