Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easy to use extensible serialization/marshalling?

I have a question about serializaton of data structures. There are a many possibilies for serialization of data structures (also called marshalling or deflating, see wiki-article). Every programming language, framework, standard or library seems to bring their own methods of serialization with it. Many also define their own data/interface description language (which i prefer to language dependend data structure defined only inside the code). Just to name a few (see wiki-article): COM IDL, CORBA IDL, Thrift IDL, google protocol buffer ".proto", XSD, ASN.1 IDL, and so on. Some of these serialzations are able generate language native data structure and code for serialize and deserialize these structures.

I did some research on this subject, but i am still undecided. So my question is: Which serialization should i use?

My requirements: extensibility, space efficiency (at least binary), efficient access to data, ease to use (possibly with generated code and getter and setters) and c++-compability.

The extensibility should provide forward- and backward compability. To be more specific, often the data formats i write will grow over the time, because i add new data fields, which i couldn't foresee at the beginning of the development. Now i would like to be able to read stored data from an outdated format with a newer software version, the data fields not found in the old stored data could be filled with default values or something. On the other side i would like to be able to read data written with the new desription. The unknown data field should be then ignored by software compiled with the "old" data description (maybe generating some warning).

Any recommondations? Recommondations on further readings on that subject would be appreciated too.

--- Edit ---

1) boost::serialization seems to be quite popular. It has some really nice features, the documentation is very good, ant the syntax seems to be quite straight forward. Maybe i am a bit picky, but there are some things i dislike: I don't see how it could handle forward compability (see 4). I would prefer generated code.

2) google protobuf seem to fit my needs better, but i haven't looked into the depth of them. They seem to handle forward and backward compability well (see 5). They have code generators for different languages, and the developers are aware of very similar concepts like (see FAQ). I will have a deeper look into protobufs.

3) boost spirit does not seem to be the thing i search.

like image 490
Metaprogger Avatar asked Feb 24 '11 23:02

Metaprogger


2 Answers

Boost::serialize is great

  • Supports different versions of archive
  • Good support of most datastructures (pointers, vectors...)
  • Very fast (10 secs for 1Gb so limitation is your hard drive)
  • Rather easy to use
  • On the fly compression if used with boost::iostreams

The drawbacks are :

  • The archive might not be compatible from one plateform to an other
  • Only for C++, no exchange with other languages

A nice alternative that is growing, is protocol buffers from Google http://code.google.com/p/protobuf/

  • Language independent
  • Version support
  • Very fast

So if you want to exchange data between different systems, I would go with Protocol Buffers. However if you have a single application, I'd use boost::serialize

like image 124
Tristram Gräbener Avatar answered Nov 20 '22 09:11

Tristram Gräbener


I used boost's serialization library for a while - it's extensible all right, efficient, and supports separate versioning for each object you're serializing. All these features of course mean that it's a complex beast, and it takes some time to learn properly. Not that snappy to compile either. And if you ever try to bring it to a platform that is not officially supported, expect debugging some very convoluted code. File compatibility across platforms can be slightly flaky, and forward compatibility won't work. Overall, boost serialization is generally not a good choice in case you need application instances communicating with each other. Still, it's not all that bad for the right project.

http://www.boost.org/doc/libs/1_46_0/libs/serialization/doc/index.html

Boost also has the newer Spirit library for more generic parsing / output, but I haven't used it and wouldn't recommend it based on first impressions - it takes some digging to even understand what the peculiarly named library is meant for.

In the end, for simpler projects rolling your own serialization library might not be a bad choice either - it's not too hard, and you get exactly the features you need. It's kind of disappointing that the C++ world still doesn't seem to have serialization solved adequately, but that's the conclusion I reached last time I had to decide on serialization functionality. Using boost's serialization for a while gave a good idea of what to aim for in my own implementation, though.

like image 36
Olli Etuaho Avatar answered Nov 20 '22 10:11

Olli Etuaho