I'd like to serialization in Scala -- I've seen the likes of sjson and the @serializable annotation -- however, I have been unable to see how to get them to deal with 1 major hurdle -- Type Erasure and Generics in Libraries.
Take for example the Graph for Scala Library. I make heavy use of it in my code and would like to write several objects holding graphs to disk throughout my code for later analysis. However, many times the node and edge types are encapsulated in generic type arguments of another class I have. How can I properly serialize these classes without either modifying the library itself to deal with reflection or "dirtying" my code by importing a large number of Type Classes (serialization according to how an object is being viewed is wholly unsatisfying anyways...)?
Example,
class Container[N](val g: Graph[N,DiEdge]) {
...
}
// in another file
def myMethod[N](container: Container[N]): Unit = {
<serialize container somehow here>
}
Serialization is important when persisting data to disk or transferring data over the network. The upickle library makes it easy to serialize Scala case classes. Serializing an object means taking the data stored in an object and converting it to bytes (or a string).
Serializing a Scala object for JSON storage means converting the object to a string and then writing it out to disk. Start by creating a case class and instantiating an object. Define a upickle writer and then serialize the object to be a string.
When working with Spark and Scala you will often find that your objects will need to be serialized so they can be sent to the Spark worker nodes. Whilst the rules for serialization seem fairly simple, interpreting them in a complex code base can be less than straightforward!
A very simple example — in this case the only thing that will be serialized is a Function1 object which has an apply method that adds 1 to it’s input. The Example object won’t be serialized. Very similar to the above, but this time within our anonymous function we’re accessing the num value.
To report on my findings, Java's XStream does a phenomenal job -- anything and everything, generics or otherwise, can be automatically serialized without any additional input. If you need a quick and no-work way to get serialization going, XStream is it!
However, it should be noted that the output XML will not be particularly concise without your own input. For example, every memory block used by Scala's HashMap will be recorded, even if most of them don't contain anything!
If you are using Graphs for Scala and if JSON is your serialization format, you can directly use graph-json
.
Here is the code and the doc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With