Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Marshalling/unmarshalling XML in Scala

I am looking at various approaches for marshalling/unmarshalling data between Scala and XML, and I'm interested in getting community feedback (preferably grounded in first-hand knowledge/experience).

We're currently using JAXB, which is fine, but I'm hoping for a pure Scala solution. I'm considering the following approaches:

  1. Use Scala's built-in XML facilities: Scala->XML would be easy, but my guess is that the other direction would be fairly painful. On the other hand, this approach supports arbitrary translation logic.

  2. Data binding: scalaxb seems to be somewhat immature at the moment and doesn't handle our current schema, and I don't know of any other data binding library for Scala. Like JAXB, an extra translation layer is required to support involved transformations.

  3. XML pickler combinators: The GData Scala Client library provides XML pickler combinators, but recent project activity has been low and I don't know what the current status is.

Questions:

  1. What are your experiences with the approaches/libraries I've listed?
  2. What are the relative advantages and disadvantages of each?
  3. Are there any other approaches or Scala libraries that I should consider?

Edit:

I added some notes on my early impressions of pickler combinators in my own answer to this question, but I'm still very interested in feedback from someone who actually knows the various approaches in depth. What I'm hoping for is a somewhat comprehensive comparison that would help developers choose the right approach for their needs.

like image 281
Aaron Novstrup Avatar asked Jan 12 '11 00:01

Aaron Novstrup


People also ask

What is marshalling in Scala?

Marshalling is the process of converting a higher-level (object) structure into some kind of lower-level representation, often a “wire format”.

What is unmarshalling marshalling?

Marshalling is converting the data present in an object into a an xml format and viewing it in an xml format and unmarshalling is reverse of it converting an xml file into an object.

What is marshalling and unmarshalling JSON?

JSON has 3 basic types: booleans, numbers, strings, combined using arrays and objects to build complex structures. Go's terminology calls marshal the process of generating a JSON string from a data structure, and unmarshal the act of parsing JSON to a data structure.

What are marshalling and unmarshalling Where is it done?

Overview. Marshalling and unmarshalling is used both on the client and server side. On the server side it is used to map an incoming request to a Scala or Java object and to map a Scala or Java object to an outgoing response.


2 Answers

I recommend using Scala's built-in XML features. I've just implemented deserialization for a document structure that looks like this:

val bodyXML = <body><segment uri="foo"><segment uri="bar" /></segment></body>

Note that the segments can be nested within each other.

A segment is implemented as follows:

case class Segment(uri: String, children: Seq[Segment])

To deserialize the XML, you do this:

val mySegments = topLevelSegments(bodyXML)

...and the implementation of topLevelSegments is just a few lines of code. Note the recursion, which digs through the XML structure:

def topLevelSegments(bodyXML: Node): Seq[Segment] = 
    (bodyXML \ "segment") map { nodeToSegment }

def nodeToSegment = (n: Node) => Segment((n \ "@uri")(0) text, childrenOf(n))

def childrenOf(n: Node): Seq[Segment] = (n \ "segment") map { nodeToSegment }

Hope that helps.

like image 195
David Avatar answered Sep 21 '22 16:09

David


For comparison, I implemented David's example using the pickler combinators from the GData Scala Client library:

def segment: Pickler[Segment] =
   wrap(elem("segment", 
           attr("uri", text) 
           ~ rep(segment))) {    // rep = zero or more repetitions
      // convert (uri ~ children) to Segment(uri, children), for unpickling
      Segment.apply 
   } {
      // convert Segment to (uri ~ children), for pickling
      (s: Segment) => new ~(s.uri, s.children toList)
   }

def body = elem("body", rep(segment))

case class Segment(uri: String, children: List[Segment])

This code is all that is necessary to specify both directions of the translation between Segments and XML, whereas a similar amount of code specifies only one direction of the translation when using the Scala XML library. In my opinion, this version is also easier to understand (once you know the pickler DSL). Of course, as David pointed out in a comment, this approach requires an additional dependency and another DSL that developers have to be familiar with.

Translating XML to Segments is as simple as

body.unpickle(LinearStore.fromFile(filename)) // returns a PicklerResult[List[Segment]]

and translating the other way looks like

xml.XML.save(filename, body.pickle(segments, PlainOutputStore.empty).rootNode)

As far as the combinator library is concerned, it seems to be in decent shape and compiles in Scala 2.8.1. My initial impression is that the library is missing a few niceties (e.g. a oneOrMore combinator) that could be remedied fairly easily. I haven't had time to see how well it handles bad input, but so far it looks sufficient for my needs.

like image 42
Aaron Novstrup Avatar answered Sep 19 '22 16:09

Aaron Novstrup