I have a question about Java serialization in scenarios where you may need to modify your serializable class and maintain backward compatibility.
I come from deep C# experience, so please allow me to compare Java with .NET.
In my Java scenario, I need to serialize an object with Java's runtime serialization mechanism, and store the binary data in permanent storage to reuse the objects in future. The problem is that, in the future, classes may be subject to changes. Fields may be added or removed.
I don't know Java serialization in the deep, except for this fantastic article about how not to program in Java when dealing with serialization. As I imagine(d), the serialVersionUID plays a key role in Java serialization, and this is where I need your help.
Apart from the article's example (I know it's bad coding), shall that field not be modified when Eclipse asks to update it after I modified the class?
I remember from the .NET world that when I add new fields I must add the [OptionalField]
Attribute to the field to get the backward compatibility, so CLR won't require it in old serialized data. Also, when I need to deprecate a field I must only remove the public methods and not the private fields.
What are the guidelines for best serialization?
Thank you.
[Add] Here is an example. Suppose I have class Foo
public class Foo {
private String bar;
}
Then I change to:
public class Foo {
private String bar;
private Integer eggs;
}
Is compatibility broken between these two version? If I deserialize an "oldFoo" when I have the "newFoo" compiled, does eggs equals null or is an exception thrown? I prefer the first, obviously!!
Compatible changes include adding or removing a method or a field. Incompatible changes include changing an object's hierarchy or removing the implementation of the Serializable interface.
Simply put, we use the serialVersionUID attribute to remember versions of a Serializable class to verify that a loaded class and the serialized object are compatible. The serialVersionUID attributes of different classes are independent. Therefore, it is not necessary for different classes to have unique values.
Changing a class from Serializable to Externalizable or vice-versa is an incompatible change since the stream will contain data that is incompatible with the implementation of the available class.
You absolutely should create a serialVersionUID every time you define a class that implements java. io. Serializable . If you don't, one will be created for you automatically, but this is bad.
Let's say you have a class MyClass
and you want to ensure serialization compatibility going forward, or at least make sure that you don't change its serialized form unintentionally. You can use Verify.assertSerializedForm()
from GS Collections test utilities in most cases.
Start by writing a test that asserts that your class has a serialVersionUID
of 0L
and has a serial form that's the empty string.
@Test
public void serialized_form()
{
Verify.assertSerializedForm(
0L,
"",
new MyClass());
}
Run the test. It will fail since the String represents a Base64 encoding and is never empty.
org.junit.ComparisonFailure: Serialization was broken. <Click to see difference>
When you click to see the difference, you'll see the actual Base64 encoding. Paste it inside the empty string.
@Test
public void serialized_form()
{
Verify.assertSerializedForm(
0L,
"rO0ABXNyAC9jYXJhbWVsa2F0YS5zaHVrbmlfZ29lbHZhLkV4ZXJjaXNlOVRlc3QkTXlDbGFzc56U\n"
+ "hVp0q+1aAgAAeHA=",
new MyClass());
}
Re-run the test. It's likely to fail again with an error message like this.
java.lang.AssertionError: serialVersionUID's differ expected:<0> but was:<-7019839295612785318>
Paste the new serialVersionUID into the test in place of 0L.
@Test
public void serialized_form()
{
Verify.assertSerializedForm(
-7019839295612785318L,
"rO0ABXNyAC9jYXJhbWVsa2F0YS5zaHVrbmlfZ29lbHZhLkV4ZXJjaXNlOVRlc3QkTXlDbGFzc56U\n"
+ "hVp0q+1aAgAAeHA=",
new MyClass());
}
The test will now pass until you change the serialized form. If you break the test (change the serialized form) by accident, the first thing to do is check that you've specified the serialVerionUID
in the Serializable class. If you leave it out, the JVM generates it for you and it's quite brittle.
public class MyClass implements Serializable
{
private static final long serialVersionUID = -7019839295612785318L;
}
If the test is still broken, you can try to restore the serialized form by marking new fields as transient, taking full control over the serialized form using writeObject(), etc.
If the test is still broken, you have to decide whether to find and revert your changes which broke serialization or treat your changes as an intentional change to the serialized form.
When you change the serialized form on purpose, you'll need to update the Base64 String to get the test to pass. When you do, it's crucial that you change the serialVersionUID
at the same time. It doesn't matter what number you choose, as long as it's a number you've never used for the class before. The convention is to change it to 2L
, then 3L
, etc. If you're starting from a randomly generated serialVersionUID
(like -7019839295612785318L
in the example), you should still bump the number to 2L
because it's still the 2nd version of the serialized form.
Note: I am a developer on GS collections.
Java's native serialization support is mainly useful for short term storage or transmission via a network, so instances of an application can communicate with little effort. If you're after longer term storage, I'd suggest you have a look at some XML serialization technique like JAXB.
It's best not to use serialization when you need to keep your data for long period of time.Try using a database or protocol buffer (Protocol Buffers are a way of encoding structured data in an efficient yet extensible format).
If you want to manage the serialized version of the class, you should implement interface Externalizable and specify how to serialize and deserialize the state of your class. This way, the serialized state can be simpler than the "real" state. For example, a TreeMap object has a state that is a red-black tree, while the serialized version is just a list of key-values (and the tree is re-created when the object is deserialized).
However, if your class is simple and it only has some optional fields, you can use the keyword "transient" and make the default serialization ignore it. For example:
public class Foo {
private String bar;
private transient Integer eggs;
}
Unfortunately I do not have a deep knowledge of C# but based on your words I can conclude that Java serialization is weaker. Field serialVersionUID is optional and can help only if you changed the class binary signature but have not changed the serializable fields. If you changed the fields you cannot read previously serialized object.
The only workaround is to implement your own searilzation mechanism. Java allows this. You have to implement your own readObject()
and writeObject()
methods. These methods should be smart enough to support backwards compatibility.
Please see javadoc of java.io.Serializable
for more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With