Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does MicroStream (de)serialization work?

I was wondering how the serialization of MicroStream works in detail. Since it is described as "Super-Fast" it has to rely on code-generation, right? Or is it based on reflections?
How would it perform in comparison to the Protobuf-Serialization, which relies on Code-generation that directly reads out of the java-fields and writes them into a bytebuffer and vice-versa.
Using reflections would drastically decrease the performance when serializing objects on a huge scale, wouldn't it?

I'm looking for a fast way to transmit and persist objects for a multiplayer-game and every millisecond counts. :)

Thanks in advance!

PS: Since I don't have enough reputation, I can not create the "microstream"-tag. https://microstream.one/

like image 927
Leo Hilbert Avatar asked Nov 06 '19 17:11

Leo Hilbert


1 Answers

I am the lead developer of MicroStream. (This is not an alias account. I really just created it. I'm reading on StackOverflow for 10 years or so but never had a reason to create an account. Until now.)

On every initialization, MicroStream analyzes the current runtime's versions of all required entity and value type classes and derives optimized metadata from them. The same is done when encountering a class at runtime that was unknown so far. The analysis is done per reflection, but since it is only done once for every handled class, the reflection performance cost is negligible. The actual storing and loading or serialization and deserialization is done via optimized framework code based on the created metadata.

If a class layout changes, the type analysis creates a mapping from the field layout that the class' instances are stored in to that of the current class. Automatically if possible (unambiguous changes or via some configurable heuristics), otherwise via a user-provided mapping. Performance stays the same since the JVM does not care if it (simplified speaking) copies a loaded value #3 to position #3 or to position #5. It's all in the metadata.

ByteBuffers are used, more precisely direct ByteBuffers, but only as an anchor for off-heap memory to work on via direct "Unsafe" low-level operations. If you are not familiar with "Unsafe" operations, a short and simple notion is: "It's as direct and fast as C++ code.". You can do anything you want very fast and close to memory, but you are also responsible for everything. For more details, google "sun.misc.Unsafe".

No code is generated. No byte code hacking, tacit replacement of instances by proxies or similar monkey business is used. On the technical level, it's just a Java library (including "Unsafe" usage), but with a lot of properly devised logic.

As a side note: reflection is not as slow as it is commonly considered to be. Not any more. It was, but it has been optimized pretty much in some past Java version(s?). It's only slow if every operation has to do all the class analysis, field lookups, etc. anew (which an awful lot of frameworks seem to do because they are just badly written). If the fields are collected (set accessible, etc.) once and then cached, reflection is actually surprisingly fast.

Regarding the comparison to Protobuf-Serialization:

I can't say anything specific about it since I haven't used Protocol Buffers and I don't know how it works internally. As usual with complex technologies, a truly meaningful comparison might be pretty difficult to do since different technologies have different optimization priorities and limitations.

Most serialization approaches give up referential consistency but only store "data" (i.e. if two objects reference a third, deserialization will create TWO instances of that third object. Like this: A->C<-B ==serialization==> A->C1 B->C2. This basically breaks/ruins/destroys object graphs and makes serialization of cyclic graphs impossible, since it creates and endlessly cascading replication. See JSON serialization, for example. Funny stuff.) Even Brian Goetz' draft for a Java "Serialization 2.0" includes that limitation (see "Limitations" at http://cr.openjdk.java.net/~briangoetz/amber/serialization.html) (and another one which breaks the separation of concerns).

MicroStream does not have that limitation. It handles arbitrary object graphs properly without ruining their references. Keeping referential consistency intact is by far not "trying to do too much", as he writes. It is "doing it properly". One just has to know how to do it properly. And it even is rather trivial if done correctly. So, depending on how many limitations Protobuf-Serialization has ("pacts with the devil"), it might be hardly or even not at all comparable to MicroStream in general.

Of course, you can always create some performance comparison tests for your particular requirements and see which technology suits you best. Just make sure you are aware of the limitations a certain technology imposes on you (ruined referential consistency, forbidden types, required annotations, required default constructor / getters / setters, etc.). MicroStream has none*.

(*) within reason: Serializing/storing system-internals (e.g. Thread) or non-entities (like lambdas or proxy instances) is, while technically possible, intentionally excluded.

like image 78
MSTM Avatar answered Oct 19 '22 23:10

MSTM