I'm building a distributed C++ application that needs to do lots of serialization and deserialization of simple data structures that's being passed between different processes and computers.
I'm not interested in serializing complex class hierarchies, but more of sending structures with a few simple members such as number, strings and data vectors. The data vectors can often be many megabytes large. I'm worried that text/xml-based ways of doing it is too slow and I really don't want to write this myself since problems like string encoding and number endianess can make it way more complicated than it looks on the surface.
I've been looking a bit at protocol buffers and boost.serialize. According to the documents protocol buffers seems to care much about performance. Boost seems somewhat more lightweight in the sense that you don't have an external language for specifying the data format which I find quite convenient for this particular project.
So my question comes down to this: does anyone know if the boost serialization is fast for the typical use case I described above?
Also if there are other libraries that might be right for this, I'd be happy to hear about them.
I would strongly suggest protocol buffers. They're incredibly simple to use, offer great performance, and take care of issues like endianness and backwards compatibility. To make it even more attractive, serialized data is language-independent thanks to numerous language implementations.
ACE and ACE TAO come to mind, but you might not like the size and scope of it. http://www.cs.wustl.edu/~schmidt/ACE.html
Regarding your query about "fast" and boost. That is a subjective term and without knowing your requirements (throughput, etc) it is difficult to answer that for you. Not that I have any benchmarks for the boost stuff myself...
There are messaging layers you can use, but those are probably slower than boost. I'd say that you identified a good solution in boost, but I've only used ACE and other proprietary communications/messaging products.
My guess is that boost is fast enough. I have used it in previous projects to serialize data to and from disk, and its performance never even came up as an issue.
My answer here talks about serialization in general, which may be helpful to you beyond which serialization library you choose to use.
Having said that, it looks like you know most of the main trouble spots with serialization (endianess string encoding). You did leave out versioning and forwards/backwards compatibility. If time is not critical I recommend writing your own serialization code. It is an enlightening experience, and the lessons you learn are invaluable. Though I will warn you it will tend to make you hate XML based protocols for their bloatedness. :)
Whichever path you choose good luck with your project.
Also check out ONC-RPC (old SUN-RPC)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With