Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are Interned Strings preserved when serializing?

If I have large object graph that contains many duplicate strings, is there a benefit to intern()ing the strings before serializing them? Will this reduce the amount of data transferred? Will the strings share pointers on the receiving end?

My guess is that the Strings would be de-duped before sending, thus reducing the size of the data and that they would all be represented by the same object on the receiving end, but that they would not actually be interned on the receiving end. (meaning there would be one new instance of the string created on each serialization 'transaction')

like image 633
James Scriven Avatar asked Dec 10 '22 05:12

James Scriven


1 Answers

It's easy enough to test:

import java.io.*;

class Foo implements Serializable {
    private String x;
    private String y;

    public Foo(String x, String y) {
        this.x = x;
        this.y = y;
    }
}

public class Test {
    public static void main(String[] args) throws IOException {
        String x = new StringBuilder("hello").append(" world").toString();
        String y = "hello world";

        showSerializedSize(new Foo(x, y));
        showSerializedSize(new Foo(x, x));
    }

    private static void showSerializedSize(Foo foo) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(baos);
        oos.writeObject(foo);
        oos.close();
        System.out.println(baos.size());
    }
}

Results on my machine:

86
77

So it looks like the deduping doesn't happen automatically.

I wouldn't use String.intern() itself though, as you probably don't want all of these strings in the normal intern pool - but you could always use a HashSet<String> to create a "temporary" intern pool.

like image 94
Jon Skeet Avatar answered Dec 25 '22 01:12

Jon Skeet