Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross-platform and language (de)serialization

I'm looking for a way to serialize a bunch of C++ structs in the most convenient way so that the serialization is portable across C++ and Java (at a minimum) and across 32bit/64bit, big/little endian platforms. The structures to be serialized just contain data, i.e. they're pure data objects with no state or behavior.

The idea being that we serialize the structs into an octet blob that we can store in a database "generically" and be read out later on. Thus avoiding changing the database whenever a struct changes and also avoiding assigning each data member to a field - i.e. we only want one table to hold everything "generically" as a binary blob. This should make less work for developers and require less changes when structures change.

I've looked at boost.serialize but don't think there's a way to enable compatibility with Java. And likewise for inheriting Serializable in Java.

If there is a way to do it by starting with an IDL file that would be best as we already have IDL files that describe the structures.

Cheers in advance!

like image 708
fwg Avatar asked Sep 14 '09 13:09

fwg


4 Answers

I stumbled here, having a very similar question. 6 years later, this might not be useful to you, but hopefully it will be to others.

There are a lot of alternatives, unfortunately with no clear winner (although one could argue that JSON is the clear winner). Even Google has released multiple competing technologies (all of them apparently being used internally):

  • FlatBuffers: this one seems to meet the requirements from the original question, has interesting benchmarks and supports some form of IDL (I'm personally not familiar with IDL)
  • Protocol Buffers: mentioned previously.
  • XFJSON: 5%-12% smaller than JSON.

Not to forget the alternatives posted in the other answers. Here are a few more:

  • YAML: JSON minus all the double quotes, but using indentation instead. It's more human readable, but probably less efficient, especially as it gets larger.
  • BSON (Binary JSON)
  • MessagePack (Another compacted JSON)

With so many variations, JSON is clearly the winner in terms of simplicity/convenience and cross-platform access. It has gained even more popularity in the last couple years, with the rise of JavaScript. A lot of people probably use that as a de-facto solution, without giving it much thought (that's what I originally did :P).

However, if size becomes an issue, but you prefer to keep things simple and not use one of the more advanced libraries, you could just compress JSON using zlib (that's what I'm doing now), or some other cross-platform algorithm (but that's a whole other topic).

To speed up JSON handling in C++, you could also use RapidJSON.

like image 159
Marco Roy Avatar answered Oct 15 '22 23:10

Marco Roy


You need ASN.1! (Some people refer to this as binary XML.) ASN.1 is very compact and thus ideal to transfer data between two systems. And for those who don't think this is ever used: several Internet protocols are based upon the ASN.1 model for data serialization!

Unfortunately, there aren't many libraries available for Java or C++ that will support ASN.1. I had to work with it several years ago and just couldn't find a good, free or inexpensive tool to allow support for ASN.1 in C++. At Objective Systems they are selling ASN.1/XML solutions but it's extremely expensive. (The ASN.1 compiler for C++ and Java, that is!) It costs you an arm and a leg at least! (But then you will have a tool that you can use with only one hand...)

like image 44
Wim ten Brink Avatar answered Oct 15 '22 22:10

Wim ten Brink


I'm surprised Jon Skeet hasn't already pounced on this one :-)

Protocol Buffers is pretty much designed for this sort of scenario -- passing structured data cross-language.

That said, if you're using a database the way you suggest, you really shouldn't be using a full-strength RDBMS like Oracle or SQL Server but rather a lightweight key-value store such as Berkeley DB or one of the many "cloud table" engines.

like image 7
Jeffrey Hantin Avatar answered Oct 15 '22 21:10

Jeffrey Hantin


If I want to go really really cross language, I normally would suggest JSON, as the ease of javascript support and an abundance of libraries, as well as being human readable and modifiable (I prefer it to XML as I find it smaller in terms of chars, faster, and more readable). It's not the most efficient in terms of space, however, and a more machine readable format like protocol buffers or thrift would have advantages there (thrift can be made from an IDL, but it is also made for encoding services, so it could be heavier than you want).

like image 7
Todd Gardner Avatar answered Oct 15 '22 21:10

Todd Gardner