Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Java's serialization slower than 3rd party APIs?

During working on sockets and serializing objects over them, I noticed that there are some 3rd party libraries for faster object serialization on Java such as Kryo and FST. Up to now, I expected that Java's serialization is optimized and the fastest. Because, it is language dependent and gives a low level solution that is expected to be faster. However, the considered libraries claim that they are faster than Java.

Can someone explain why Java could not provide the fastest serialization solution? For the sake of what does it give up a better performance?

Thanks in advance.

like image 493
ovunccetin Avatar asked Oct 18 '13 10:10

ovunccetin


People also ask

Why does serialization adversely affect the performance?

The main problem with Java Serialization is performance and efficiency. Java serialization is much slower than using in memory stores and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage.

What are the disadvantages of serialization?

But the constructor of the class remains uncalled. This results in a variation of Java Standard Flow. This process is inefficient when it comes to memory utilization. Serialization is not useful in applications that need concurrent access without using third party APIs.

Why is serialization not good?

Serialization is brittle, it pokes into private field, violates constructor invariance, it's horrible in so many ways. The only thing appealing about it is that it's easy to use in simple use cases. That's what motivated getting it in there. But now, we do have to get it out."

What is the difference between serialization and deserialization in API?

Serialization is a mechanism of converting the state of an object into a byte stream. Deserialization is the reverse process where the byte stream is used to recreate the actual Java object in memory. This mechanism is used to persist the object.


1 Answers

There are several reasons (i am the author of http://code.google.com/p/fast-serialization/)

Reasons:

  • crawls up the Class hierarchy for each Object doing several calls to read/writeObject per Object in case.
  • Partially poor coding (improved with 1.7)
  • Some often used classes make use of old slow + outdated serialization features such as putfield/getfield etc.
  • Too much temporary Object allocation
  • A lot of validation (versioning, implemented interfaces)
  • Slow Java Input/Output streams
  • Reflection to set/get field values.
  • use of JDK collections requiring "big numbers" such as Integer or Long instead of primitives.
  • implementation lacks certain algorithmic optimizations :-)
  • primitives are reordered into network byte order (in java code, not native) on x86.

In order to give better performance, they would have to give up support of old versioning schemes (e.g. the way read/writeObject currently works is suboptimal), and make some things such as versioning support optional or choose more performance sensitive approaches to that (would be possible). Additionally HotSpot might add some intrinsics to improve low level handling of primitives. One needs to have performance in mind when designing an API, which was probably not the case with JDK Serialization.

like image 196
R.Moeller Avatar answered Oct 30 '22 02:10

R.Moeller