Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging Documents while preserving xsi:type

I have 2 Document objects with documents that contain similiar XML's. For example:

<tt:root xmlns:tt="http://myurl.com/">
  <tt:child/>
  <tt:child/>
</tt:root>

And the other one:

<ns1:root xmlns:ns1="http://myurl.com/" xmlns:ns2="http://myotherurl.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <ns1:child/>
  <ns1:child xsi:type="ns2:SomeType"/>
</ns1:root>

I need to merge them to 1 document with 1 root element and 4 child elements. Problem is, if I use document.importNode function to do the merging, it properly handles the namespaces everywhere BUT xsi:type element. So what I'm getting in result is this:

<tt:root xmlns:tt="http://myurl.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <tt:child/>
  <tt:child/>
  <ns1:child xmlns:ns1="http://myurl.com/"/>
  <ns1:child xmlns:ns1="http://myurl.com/" xsi:type="ns2:SomeType"/>
</tt:root>

As you can see, ns2 is used in xsi:type but is not defined anywhere. Is there any automated way to solve this problem?

Thanks.

ADDED:

If this task is impossible to complete using the default java DOM libraries, maybe there is some other library I can use to complete my task?

like image 607
bezmax Avatar asked Jun 01 '11 08:06

bezmax


People also ask

What is XSI type in XML?

The prefix "xsi" is the namespace prefix used by convention for the XML Schema instance namespace. XML documents can contain elements that have an xsi:type attribute. This behavior provides an explicit data type for the element. The MRM XML parser in sensitive to xsi:type attributes in the XML document.

What is xmlns?

The xmlns attribute specifies the xml namespace for a document. Note: The xmlns attribute is required in XHTML, invalid in HTML 4.01, and optional in HTML5. Note: The HTML validator at http://w3.org does not complain when the xmlns attribute is missing in an XHTML document.


5 Answers

If I fix up the Namespace problem in your second file (by binding the "xsi" prefix), and do the merge using the code below the namespace bindings are preserved on the output; or at least they are here (vanilla Java 64-bit on Windows build 1.6.0_24).

String s1 = "<!-- 1st XML document here -->";
String s2 = "<!-- 2nd XML document here -->";

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware( true );
DocumentBuilder builder = factory.newDocumentBuilder();

Document doc1 = builder.parse( new ByteArrayInputStream( s1.getBytes() ) );
Document doc2 = builder.parse( new ByteArrayInputStream( s2.getBytes() ) );

Element doc1root = ( Element )doc1.getDocumentElement();
Element doc2root = ( Element )doc2.getDocumentElement();

NamedNodeMap atts1 = doc1root.getAttributes();
NamedNodeMap atts2 = doc2root.getAttributes();

for( int i = 0; i < atts1.getLength(); i++ )
{
    String name = atts1.item( i ).getNodeName();
    if( name.startsWith( "xmlns:" ) )
    {
        if( atts2.getNamedItem( name ) == null )
        {
            doc2root.setAttribute( name, atts1.item( i ).getNodeValue() );
        }    
    }    
}

NodeList nl = doc1.getDocumentElement().getChildNodes();
for( int i = 0; i < nl.getLength(); i++ )
{
    Node n = nl.item( i );
    doc2root.appendChild( doc2.importNode( n, true ) );

}

TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
StreamResult streamResult = new StreamResult( System.out );
transformer.transform( new DOMSource( doc2 ), streamResult );
like image 130
alexbrn Avatar answered Nov 15 '22 02:11

alexbrn


The problem here is the use of namespace prefixes in attribute values; something that was never considered when the namespace standard was created, and something that the common Java DOM/XML tools cannot easily handle. However, you could solve it by

  1. Before merging, replace every instance of xsi:type="prefix:value" with xsi:type="{namespace}value". By doing this, you are not dependent on the prefix mapping. In your example, <xsi:type="ns2:SomeType" would become xsi:type="{http://myotherurl.com/}SomeType".
  2. Merge the documents.
  3. On the result document, reverse the replacement in step 1. The prefix mappings have to be carefully managed to avoid collisions; possibly a new mapping has to be created.
like image 20
forty-two Avatar answered Nov 15 '22 03:11

forty-two


A single-line of XQuery could do the job: construct a new node named as the context root element, then import its children together with those from the other document:

declare variable $other external; element {node-name(*)} {*/*, $other/*/*}

Though in XQuery you don't have full control over namespace nodes (at least in XQuery 1.0), it has a copy-namespaces mode setting that can be used to ask for keeping the namespace context intact, in case the implementation does preserve it by default.

If XQuery is a viable option, then saxon9he.jar could be the "magic xml library" that you are after.

Here is sample code exposing some context, using the s9api API:

import javax.xml.parsers.DocumentBuilderFactory;
import net.sf.saxon.s9api.*;
import org.w3c.dom.Document;

...

  Document merge(Document context, Document other) throws Exception
  {
    Processor processor = new Processor(false);
    XQueryExecutable executable = processor.newXQueryCompiler().compile(
      "declare variable $other external; element {node-name(*)} {*/*, $other/*/*}");
    XQueryEvaluator evaluator = executable.load();    
    DocumentBuilder db = processor.newDocumentBuilder();
    evaluator.setContextItem(db.wrap(context));
    evaluator.setExternalVariable(new QName("other"), db.wrap(other));
    Document doc =
      DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
    processor.writeXdmValue(evaluator.evaluate(), new DOMDestination(doc));
    return doc;
  }
like image 32
Gunther Avatar answered Nov 15 '22 04:11

Gunther


I would take JAXB and the Mergeable plugin to generate mergeFrom methods in schema-derived classes. Then:

  • Unmarshal o1, o2
  • Marge o1, o2 using the generated methods into o3
  • Marshal o3

JAXB normally handles xsi:type quite allright.

like image 40
lexicore Avatar answered Nov 15 '22 03:11

lexicore


UPDATE

This will not work for the case where the two documents has colliding namespace prefixes (the mapping from the second document will replace the mapping from from the first).

You could copy the namespace declarations from the second document to the imported nodes. Since child nodes can override a parent nodes prefix this is valid:

<foo:root xmlns:foo="urn:ROOT">
    <foo:child xmlns:foo="urn:CHILD" xsi:type="foo:child-type">
       ...
    </foo:child>
</foo:root>

In the above XML the namespace bound to the prefix "foo" is overridden in the scope of the child element. You can accomplish this for your use case by doing the following:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class Demo {

    public static void main(String[] args) throws Exception  {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();

        File file1 = new File("src/forum231/input1.xml");
        Document doc1 = db.parse(file1);
        Element rootElement1 = doc1.getDocumentElement();

        File file2 = new File("src/forum231/input2.xml");
        Document doc2 = db.parse(file2);
        Element rootElement2 = doc2.getDocumentElement();

        // Copy Child Nodes
        NodeList childNodes2 = rootElement2.getChildNodes();
        for(int x=0; x<childNodes2.getLength(); x++) {
            Node importedNode = doc1.importNode(childNodes2.item(x), true);
            if(importedNode.getNodeType() == Node.ELEMENT_NODE) {
                Element importedElement = (Element) importedNode;
                // Copy Attributes
                NamedNodeMap namedNodeMap2 = rootElement2.getAttributes();
                for(int y=0; y<namedNodeMap2.getLength(); y++) {
                    Attr importedAttr = (Attr) doc1.importNode(namedNodeMap2.item(y), true);
                    importedElement.setAttributeNodeNS(importedAttr);
                }
            }
            rootElement1.appendChild(importedNode);
        }

        // Output Document
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        DOMSource source = new DOMSource(doc1);
        StreamResult result = new StreamResult(System.out);
        t.transform(source, result);
    }

}

Output

<?xml version="1.0" encoding="UTF-8" standalone="no"?><tt:root xmlns:tt="http://myurl.com/">
  <tt:child/>
  <tt:child/>

  <ns1:child xmlns:ns1="http://myurl.com/" xmlns:ns2="http://myotherurl.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
  <ns1:child xmlns:ns1="http://myurl.com/" xmlns:ns2="http://myotherurl.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns2:SomeType"/>
</tt:root>

ORIGINAL ANSWER

In addition to copying the elements, you could copy the attributes. This will ensure that the resulting document contains the necessary namespace declarations:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class Demo {

    public static void main(String[] args) throws Exception  {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();

        File file1 = new File("input1.xml");
        Document doc1 = db.parse(file1);
        Element rootElement1 = doc1.getDocumentElement();

        File file2 = new File("input2.xml");
        Document doc2 = db.parse(file2);
        Element rootElement2 = doc2.getDocumentElement();

        // Copy Attributes
        NamedNodeMap namedNodeMap2 = rootElement2.getAttributes();
        for(int x=0; x<namedNodeMap2.getLength(); x++) {
            Attr importedNode = (Attr) doc1.importNode(namedNodeMap2.item(x), true);
            rootElement1.setAttributeNodeNS(importedNode);
        }

        // Copy Child Nodes
        NodeList childNodes2 = rootElement2.getChildNodes();
        for(int x=0; x<childNodes2.getLength(); x++) {
            Node importedNode = doc1.importNode(childNodes2.item(x), true);
            rootElement1.appendChild(importedNode);
        }

        // Output Document
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        DOMSource source = new DOMSource(doc1);
        StreamResult result = new StreamResult(System.out);
        t.transform(source, result);
    }

}

Output:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tt:root xmlns:tt="http://myurl.com/" xmlns:ns1="http://myurl.com/" xmlns:ns2="http://myotherurl.com/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <tt:child/>
  <tt:child/>

  <ns1:child/>
  <ns1:child xsi:type="ns2:SomeType"/>
</tt:root>
like image 1
bdoughan Avatar answered Nov 15 '22 02:11

bdoughan