Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groovy: how to parse xml and preserve namespaces and schemaLocations

Tags:

java

groovy

I'm trying to use groovy to simply add a node to a at a particular location. My source schema looks like this

<s1:RootNode
   xmlns:s1="http://localhost/s1schema"
   xmlns:s2="http://localhost/s2schema"
   xsi:schemaLocation="http://localhost/s1schema s1schema.xsd 
   http://localhost/s2schema s2schema.xsd">
 <s1:aParentNode>
  <s2:targetNode>
   <s2:childnode1 />
   <s2:childnode2 />
   <s2:childnode3 />
   <s2:childnode4 />
 </s2:targetNode>
</s1:aParentNode>
</s1:RootNode>

I'd like to simply add a new child node inline with the other ones to make the output

<s1:RootNode
    xmlns:s1="http://localhost/s1schema"
    xmlns:s2="http://localhost/s2schema"
    xsi:schemaLocation="http://localhost/s1schema s1schema.xsd 
    http://localhost/s2schema s2schema.xsd">
 <s1:aParentNode>    
   <s2:targetNode>
     <s2:childnode1 />
     <s2:childnode2 />
     <s2:childnode3 />
     <s2:childnode4 />
     <s2:childnode5 >value</s2:childnode5>
   </s2:targetNode>
  </s1:aParentNode>
 </s1:RootNode>

To do this i have the following simple groovy script

  def data = 'value'
def root = new XmlSlurper(false,true).parseText( sourceXML )
        root.'aParentNode'.'topNode'.appendNode{
            's2:childnode5' data
        }
groovy.xml.XmlUtil.serialize(root);

however when i do this the namespaces and schemaLocations that are applied to the root node are being removed. and the namespace, but not the schema location is being added to each of the child nodes.

this is causing validation issues downstream.

How do i simply process this xml. perform no validation and leave the xml as is and add a single node of a namespace i specify?

One note: we process many messages and i won't know in advance the outer most namespace (s1 in the above example) but even with that, i'm really just lookign for a technique that is a "dumber" processing of xml

Thanks!

like image 753
Beta033 Avatar asked Mar 01 '13 21:03

Beta033


2 Answers

First, I had to add xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" to define your xsi namespace. Without it I would receive a SAXParseException for the unbound xsi prefix.

Additionally, I consulted this question on successfully appending a namespaced xml node to an existing document.

Finally, we had to utilize the StreamingMarkupBuilder to work around the moving of the namespaces. Bascially, by default the serializer moves the referenced namespaces to the first node that actually uses the namespace. In your case it was moving your s2 namespace attribute to the "targetNode" tag. The following code produces the results you want, but you will still have to know the correct namespaces to use to instantiate the StreamingMarkupBuilder.

 def root = new XmlSlurper(false, true).parseText( sourceXML )
 def data = '<s2:childnode5 xmlns:s2="http://localhost/s2schema">value</s2:childnode5>'
 def xmlFragment = new XmlSlurper(false, true).parseText(data)
 root.'aParentNode'.'targetNode'.appendNode(xmlFragment);

 def outputBuilder = new StreamingMarkupBuilder()
 String result = XmlUtil.serialize(outputBuilder.bind {
     mkp.declareNamespace('s1':"http://localhost/s1schema")
     mkp.declareNamespace('s2':"http://localhost/s2schema")
     mkp.yield root }
 )
like image 185
purgatory101 Avatar answered Nov 15 '22 00:11

purgatory101


XMLSlurper (or XMLParser) does not handle namespaces if you set the second parameter of the constructor:

XmlSlurper (boolean validating, boolean namespaceAware)

to false:

def root = new XmlSlurper(false, false).parseText( sourceXML )

Without setting namespaceAware to false, I also faced strange bahavior of the parser. After setting to false, it leaves the XML as is, with no namespace changes.

like image 45
KarelHusa Avatar answered Nov 14 '22 23:11

KarelHusa