Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Java deserialization CPU-bound?

I have a Java program which prepares data into a fairly complex and big data structure in memory (several GB) and serializes it to disk, and another program which reads back the serialized data structure in memory. I was surprised to notice that the deserialization step is pretty slow, and that it is CPU-bound. (100% CPU usage in top but only 3 to 5 MB/s read with iotop, which is very low for what should be sequential reads on a hard drive). The CPU is fairly recent (Core i7-3820), the structure fits in memory, no swap space is configured.

Why is this so? Is there an alternative way to serialize objects in Java which does not have the CPU as bottleneck?

Here is the deserialization code, in case it matters:

FileInputStream f = new FileInputStream(path);
ObjectInputStream of = new ObjectInputStream(f);
Object obj = of.readObject();
like image 740
a3nm Avatar asked Apr 10 '12 15:04

a3nm


2 Answers

Deserialization is pretty expensive. If you use the generic deserialization, it will use lots of reflection and creation of objects.

There are lots of alternatives which are faster and most use generated code instead of reflection.

http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

You will note that one of the fastest is using Externalizable which may be an option for you. This means adding custom methods for the serialization and deserialization of objects.

I have written much fastest approaches but this avoid creating any objects by recycling them or using the data in the file in-place (i.e. without needing to deserialize them)

like image 79
Peter Lawrey Avatar answered Oct 05 '22 23:10

Peter Lawrey


It's hard to say without looking at this with a profiler or knowing much about the actual hierarchy of your object's structure, but I'm assuming that if it's "fairly complex" and on the order of "several GB" large, you're probably dealing with thousands of individual objects.

My best guess here is that your performance is getting killed by Java Reflection. Reflection is used to construct the Objects from your stream, which is known to be at least two orders of magnitude slower than calling constructors directly within code. So if your object has tons of "small" Objects, Reflection is going to spend a lot of time reconstructing them.

One thing you could try (if you haven't already) would be to declare the following line at the top of each of your Serializable classes:

private static final long serialVersionUID = [some number]L;

If you don't declare this ID, Java will have to compute it, so you do save some CPU cycles by declaring it.

For further reference:

http://oreilly.com/catalog/javarmi/chapter/ch10.html

like image 20
CodeBlind Avatar answered Oct 05 '22 23:10

CodeBlind