Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal Serialization of Primitive Types

We are beginning to roll out more and more WAN deployments of our product (.NET fat client with an IIS hosted Remoting backend). Because of this we are trying to reduce the size of the data on the wire.

We have overridden the default serialization by implementing ISerializable (similar to this), and we are seeing anywhere from 12% to 50% gains. Most of our efforts focus on optimizing arrays of primitive types. Is there a fancy way of serializing primitive types, beyond the obvious?

For example, today we serialize an array of ints as follows:

[4-bytes (array length)][4-bytes][4-bytes]

Can anyone do significantly better?

The most obvious example of a significant improvement, for boolean arrays, is putting 8 bools in each byte, which we already do.

Note: Saving 7 bits per bool may seem like a waste of time, but when you are dealing with large magnitudes of data (which we are), it adds up very fast.

Note: We want to avoid general compression algorithms because of the latency associated with it. Remoting only supports buffered requests/responses (no chunked encoding). I realize there is a fine line between compression and optimal serialization, but our tests indicate we can afford very specific serialization optimizations at very little cost in latency. Whereas reprocessing the entire buffered response into new compressed buffer is too expensive.

like image 786
Greg Dean Avatar asked Feb 10 '09 21:02

Greg Dean


1 Answers

(relates to messages/classes, not just primitives)

Google designed "protocol buffers" for this type of scenario (they shift a huge amount of data around) - their format is compact (using things like base-128 encoding) but extensible and version tolerant (so clients and servers can upgrade easily).

In the .NET world, I can recommend 2 protocol buffers implementations:

  • protobuf-net (by me)
  • dotnet-protobufs (by Jon Skeet)

For info, protobuf-net has direct support for ISerializable and remoting (it is part of the unit tests). There are performance/size metrics here.

And best of all, all you do is add a few attributes to your classes.

Caveat: it doesn't claim to be the theoretical best - but pragmatic and easy to get right - a compromise between performance, portability and simplicity.

like image 76
Marc Gravell Avatar answered Sep 29 '22 08:09

Marc Gravell