I've been lately trying to learn more and generally test Java's serialization for both work and personal projects and I must say that the more I know about it, the less I like it. This may be caused by misinformation though so that's why I'm asking these two things from you all:
1: On byte level, how does serialization know how to match serialized values with some class?
One of my problems right here is that I made a small test with ArrayList containing values "one", "two", "three". After serialization the byte array took 78 bytes which seems awfully lot for such low amount of information(19+3+3+4 bytes). Granted there's bound to be some overhead but this leads to my second question:
2: Can serialization be considered a good method for persisting objects at all? Now obviously if I'd use some homemade XML format the persistence data would be something like this
<object> <class="java.util.ArrayList"> <!-- Object array inside Arraylist is called elementData --> <field name="elementData"> <value>One</value> <value>Two</value> <value>Three</value> </field> </object>
which, like XML in general, is a bit bloated and takes 138 bytes(without whitespaces, that is). The same in JSON could be
{ "java.util.ArrayList": { "elementData": [ "one", "two", "three" ] } }
which is 75 bytes so already slightly smaller than Java's serialization. With these text-based formats it's of course obvious that there has to be a way to represent your basic data as text, numbers or any combination of both.
So to recap, how does serialization work on byte/bit level, when it should be used and when it shouldn't be used and what are real benefits of serialization besides that it comes standard in Java?
Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.
If you make a new version of your application in which you've changed your Java classes, for example you add or remove member variables, then you can't read the files that were written with the older version anymore. For this reason, serialization is not so good for long-term storage.
Serialization in Java is a mechanism of writing the state of an object into a byte-stream. It is mainly used in Hibernate, RMI, JPA, EJB and JMS technologies. The reverse operation of serialization is called deserialization where byte-stream is converted into an object.
Serialization in Java allows us to convert an Object to stream that we can send over the network or save it as file or store in DB for later usage. Deserialization is the process of converting Object stream to actual Java Object to be used in our program.
I would personally try to avoid Java's "built-in" serialization:
For details of what the actual bytes mean, see the Java Object Serialization Specification.
There are various alternatives, such as:
(Disclaimer: I work for Google, and I'm doing a port of Protocol Buffers to C# as my 20% project, so clearly I think that's a good bit of technology :)
Cross-platform formats are almost always more restrictive than platform-specific formats for obvious reasons - Protocol Buffers has a pretty limited set of native types, for example - but the interoperability can be incredibly useful. You also need to consider the impact of versioning, with backward and forward compatibility, etc. The text formats are generally hand-editable, but tend to be less efficient in both space and time.
Basically, you need to look at your requirements carefully.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With