Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I print a groovy Node with namespace preserved?

Tags:

When I use this code to output some XML I parsed (and modified) with XmlParser

XmlParser parser = new XmlParser()
def root = parser.parseText(feedUrl.toURL().text)
def writer = new StringWriter()
new XmlNodePrinter(new PrintWriter(writer)).print(root)
println writer.toString()

the namespace declarations on the root node are not printed, even though they are there in the toString() of root... any ideas?

like image 979
danb Avatar asked Oct 22 '08 20:10

danb


2 Answers

I've just had the same problem and after a bit of fiddling I've found a workaround.

You use the XmlSluper instead of the XmlParser and use StreamingMarkupBuilder instead of XmlNodePrinter. Then you take advantage of the closure in bind and use the mkp built-in variable to declare the namespaces.

For example; using the source xml example of Ted's from above:

def root = new XmlSlurper().parseText("http://stackoverflow.com/feeds/question/227447".toURL().text))
def outputBuilder = new StreamingMarkupBuilder()
String result = XmlUtil.serialize(outputBuilder.bind {
    mkp.declareNamespace('':'http://www.w3.org/2005/Atom')
    mkp.declareNamespace('creativeCommons':'http://backend.userland.com/creativeCommonsRssModule')
    mkp.declareNamespace('re':'http://purl.org/atompub/rank/1.0')
    mkp.yield root }
)
println result

Results in :

<?xml version="1.0" encoding="UTF-8"?><feed xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns="http://www.w3.org/2005/Atom" xmlns:re="http://purl.org/atompub/rank/1.0">
<title type="text">How do I print a groovy Node with namespace preserved? - Stack Overflow </title>
<link rel="self" type="application/atom+xml" href="http://stackoverflow.com/feeds/question/227447"/>
<link rel="alternate" type="text/html" href="http://stackoverflow.com/questions/227447"/>
<subtitle>most recent 30 from stackoverflow.com</subtitle>
<updated>2011-02-16T05:13:17Z</updated>
<id>http://stackoverflow.com/feeds/question/227447</id>
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5/rdf</creativeCommons:license>
<entry>
<id>http://stackoverflow.com/questions/227447/how-do-i-print-a-groovy-node-with-namespace-preserved</id>
<re:rank scheme="http://stackoverflow.com">2</re:rank>
like image 75
Damo Avatar answered Sep 18 '22 23:09

Damo


It looks like it's denormalizing the output and including the namespace context along with the nodes that actually need the namespace context.

For example, the webpage for this question comes in with creativeCommons namespace embedded:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:thr="http://purl.org/syndication/thread/1.0">
  <!-- snip -->
  <creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5/rdf</creativeCommons:license>
  <!-- snip -->
</feed>

When you output the xml using this script:

def root = new XmlParser().parseText("http://stackoverflow.com/feeds/question/227447".toURL().text)
println new XmlNodePrinter().print(root)

It ends up moving the namespace to the license node that needs that namespace. Not a huge deal in this case as there is only a single node in that namespace. If most of the XML were namespaced, it'd probably bloat things quite a bit more.

<feed xmlns="http://www.w3.org/2005/Atom">
  <!-- snip -->
    <creativeCommons:license xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">
http://www.creativecommons.org/licenses/by-nc/2.5/rdf
  </creativeCommons:license>
  <!-- snip -->
</feed>

If you actually wanted the nodes normalized, you'd have to make some tweaks to the XmlNodePrinter to do 2 passes through the XML, first to gather all of the used namespaces and 2nd to output them at the top rather than within each namespaced node. The groovy source code is actually pretty readable and wouldn't be that hard to modify if you actually needed this.

like image 45
Ted Naleid Avatar answered Sep 21 '22 23:09

Ted Naleid