Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: Combine xml trees of data?

Tags:

xml

scala

I'm curious for the best way to combine a set of xml trees containing similar data to a single set ('union' style).

I did implement a working solution but the code looks bad and I have a strong gut feeling that there must be a much nicer and compact way of implementing this.

What I'm trying to do is in the simplest case combining something like:

<fruit> <apple /> <orange /> </fruit>

and:

<fruit> <banana /> </fruit>

To:

<fruit> <apple/> <orange/> <banana/> </fruit>

Any good ideas how to make a clean implementation of this in scala?

like image 714
Bjorn J Avatar asked Nov 05 '22 12:11

Bjorn J


2 Answers

with

val appleAndOrange : Elem = <fruit> <apple/> <orange/> </fruit>

and

val banana : Elem = <fruit> <banana> </fruit>

you can do

val all = appleAndOrange.copy(child = appleAndOrange.child ++ banana.child)

However, this simply takes the the label <fruit> from appleAndOrange, and ignore the one from banana, which here happens to be the same. Same for You have to decide what checks you want and what behavior, if they are not the same. Same for prefixes, attributes, and scopes.

like image 158
Didier Dupont Avatar answered Nov 09 '22 17:11

Didier Dupont


Here is another approach that is worth considering. We're essentially going to be building the scala.xml.Elem from a string and making use of some XPath style querying.

import scala.xml._
def childUnion(parent: String, a: Elem, b: Elem): Elem = {
    val open:String = "<" + parent + ">"
    val close:String = "</" + parent + ">"
    val children = a \\ parent \ "_" ++ b \\ parent \ "_"
    return XML.loadString(open + children + close)
}

First we created the open and close tags, which are just strings. Then we construct children by using some XPath style query.

\\ is an operator on Elem which returns elements and all subsequences of the Elem.

\ is similar but it returns the elements of the Elem.

"_" is the wildcard.

Why not just \? I had trouble figuring this out myself based on the documentation but looking at XPath for Java leads me to believe that \\ includes the entire Elem itself and children while \ only includes the children, so if we had <parent><x/></parent> \ "parent" we would find nothing since only <x/> is passed.

Now this method is not awesome. What can we do to make it a bit more awesome? We'd better make use of Scala's wonderful Option class and the foldLeft method.

def childUnion(parent: String, a: Elem, b: Elem*): Option[Elem] = {
    val parentElem = a \\ parent

    parentElem.size match {
        case 0 => None // no parent present
        case _ => 
            val children = b.foldLeft(parentElem \ "_")((d,c) => (d ++ (c \\ parent \ "_")))
            val open:String = "<" + parent + ">"
            val close:String = "</" + parent + ">"
            Some(XML.loadString(open + children + close))
    }
}

This of course has the sweetly added benefit of working on just one Elem, cases where the parent is not present, and a variable number of Elem provided as arguments. Here is a long list of examples I ran while coming up with this final method,

scala> a
res85: scala.xml.Elem = <fruit> <apple></apple> <orange></orange> </fruit>

scala> b
res86: scala.xml.Elem = <fruit> <banana></banana> </fruit>

scala> c
res87: scala.xml.Elem = <box><fruit><apple></apple></fruit></box>

scala> d
res88: scala.xml.Elem = <box><nofruit></nofruit></box>

scala> e
res89: scala.xml.Elem = <fruit></fruit>

scala> val f = <fruit />
f: scala.xml.Elem = <fruit></fruit>

scala> childUnion("fruit", a)
res91: Option[scala.xml.Elem] = Some(<fruit><apple></apple><orange></orange></fruit>)

scala> childUnion("fruit", b)
res92: Option[scala.xml.Elem] = Some(<fruit><banana></banana></fruit>)

scala> childUnion("fruit", c)
res93: Option[scala.xml.Elem] = Some(<fruit><apple></apple></fruit>)

scala> childUnion("fruit", d)
res94: Option[scala.xml.Elem] = None

scala> childUnion("fruit", e)
res95: Option[scala.xml.Elem] = Some(<fruit></fruit>)

scala> childUnion("fruit", a, b)
res96: Option[scala.xml.Elem] = Some(<fruit><apple></apple><orange></orange><banana></banana></fruit>)

scala> childUnion("fruit", a, e)
res97: Option[scala.xml.Elem] = Some(<fruit><apple></apple><orange></orange></fruit>)

scala> childUnion("fruit", a, c)
res98: Option[scala.xml.Elem] = Some(<fruit><apple></apple><orange></orange><apple></apple></fruit>)

scala> childUnion("fruit", a, d)
res99: Option[scala.xml.Elem] = Some(<fruit><apple></apple><orange></orange></fruit>)

scala> childUnion("fruit", e, d)
res100: Option[scala.xml.Elem] = Some(<fruit></fruit>)

scala> childUnion("fruit", d, d)
res101: Option[scala.xml.Elem] = None

scala> childUnion("fruit", f)
res102: Option[scala.xml.Elem] = Some(<fruit></fruit>)
like image 45
tysonjh Avatar answered Nov 09 '22 17:11

tysonjh