Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest serialization/deserialization of Scala case classes

Tags:

redis

scala

If I've got a nested object graph of case classes, similar to the example below, and I want to store collections of them in a redis list, what libraries or tools should I look at that that will give the fastest overall round trip to redis?

This will include:

  • Time to serialize the item
  • network cost of transferring the serialized data
  • network cost of retrieving stored serialized data
  • time to deserialize back into case classes

    case class Person(name: String, age: Int, children: List[Person]) {}
    
like image 931
user2668128 Avatar asked Aug 10 '13 08:08

user2668128


People also ask

Is Scala case class serializable?

All case classes automatically extend Product and Serializable . It looks ugly ? yes.

What is serialization in Scala?

Akka has a built-in Extension for serialization, and it is both possible to use the built-in serializers and to write your own. The serialization mechanism is both used by Akka internally to serialize messages, and available for ad-hoc serialization of whatever you might need it for.

What is serializable object in Scala?

Serializing a Scala object for JSON storage means converting the object to a string and then writing it out to disk. Start by creating a case class and instantiating an object.

What is the difference between serialization and deserialization in API?

Serialization is a mechanism of converting the state of an object into a byte stream. Deserialization is the reverse process where the byte stream is used to recreate the actual Java object in memory.


2 Answers

UPDATE (2018): scala/pickling is no longer actively maintained. There are hoards of other libraries that have arisen as alternatives which take similar approaches but which tend to focus on specific serialization formats; e.g., JSON, binary, protobuf.

Your use case is exactly the targeted use case for scala/pickling (https://github.com/scala/pickling). Disclaimer: I'm an author.

Scala/pickling was designed to be a faster, more typesafe, and more open alternative to automatic frameworks like Java or Kryo. It was built in particular for distributed applications, so serialization/deserialization time and serialized data size take a front seat. It takes a different approach to serialization all together- it generates pickling (serialization) code inline at the use-site at compile-time, so it's really very fast.

The latest benchmarks are in our OOPSLA paper- for the binary pickle format (you can also choose others, like JSON) scala/pickling is consistently faster than Java and Kryo, and produces binary representations that are on par or smaller than Kryo's, meaning less latency when passing your pickled data over the network.

For more info, there's a project page: http://lampwww.epfl.ch/~hmiller/pickling

And a ScalaDays 2013 talk from June on Parley's.

We'll also be presenting some new developments in particular related to dealing with sending closures over the network at Strange Loop 2013, in case that might also be a pain point for your use case.

As of the time of this writing, scala/pickling is in pre-release, with our first stable release planned for August 21st.

like image 132
Heather Miller Avatar answered Oct 11 '22 17:10

Heather Miller


Update:

You must be careful to use the serialize methods from JDK. The performance is not great and one small change in your class will make the data unable to deserialize.


I've used scala/pickling but it has a global lock while serializing/deserializing.

So instead of using it, I write my own serialization/deserialization code like this:

import java.io._

object Serializer {

  def serialize[T <: Serializable](obj: T): Array[Byte] = {
    val byteOut = new ByteArrayOutputStream()
    val objOut = new ObjectOutputStream(byteOut)
    objOut.writeObject(obj)
    objOut.close()
    byteOut.close()
    byteOut.toByteArray
  }

  def deserialize[T <: Serializable](bytes: Array[Byte]): T = {
    val byteIn = new ByteArrayInputStream(bytes)
    val objIn = new ObjectInputStream(byteIn)
    val obj = objIn.readObject().asInstanceOf[T]
    byteIn.close()
    objIn.close()
    obj
  }
}

Here is an example of using it:

case class Example(a: String, b: String)

val obj = Example("a", "b")
val bytes = Serializer.serialize(obj)
val obj2 = Serializer.deserialize[Example](bytes)
like image 27
Bin Wang Avatar answered Oct 11 '22 19:10

Bin Wang