If I have large object graph that contains many duplicate strings, is there a benefit to intern()ing the strings before serializing them? Will this reduce the amount of data transferred? Will the strings share pointers on the receiving end?
My guess is that the Strings would be de-duped before sending, thus reducing the size of the data and that they would all be represented by the same object on the receiving end, but that they would not actually be interned on the receiving end. (meaning there would be one new instance of the string created on each serialization 'transaction')
It's easy enough to test:
import java.io.*;
class Foo implements Serializable {
private String x;
private String y;
public Foo(String x, String y) {
this.x = x;
this.y = y;
}
}
public class Test {
public static void main(String[] args) throws IOException {
String x = new StringBuilder("hello").append(" world").toString();
String y = "hello world";
showSerializedSize(new Foo(x, y));
showSerializedSize(new Foo(x, x));
}
private static void showSerializedSize(Foo foo) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(foo);
oos.close();
System.out.println(baos.size());
}
}
Results on my machine:
86
77
So it looks like the deduping doesn't happen automatically.
I wouldn't use String.intern()
itself though, as you probably don't want all of these strings in the normal intern pool - but you could always use a HashSet<String>
to create a "temporary" intern pool.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With