Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doing DOM Node-to-String transformation, but with namespace issues

Tags:

java

xml

So we have an XML Document with custom namespaces. (The XML is generated by software we don't control. It's parsed by a namespace-unaware DOM parser; standard Java7SE/Xerces stuff, but also outside our effective control.) The input data looks like this:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
    .... 18 blarzillion lines of XML ....
    <Thing CustomAttr:gibberish="borkborkbork" ... />
    .... another 27 blarzillion lines ....
</MainTag>

The Document we get is usable and xpath-queryable and traversable and so on.

Converting this Document into a text format for writing out to a data sink uses the standard Transformer approach described in a hundred SO "how do I change my XML Document into a Java string?" questions:

Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter stringwriter = new StringWriter();
transformer.transform (new DOMSource(theXMLDocument), new StreamResult(stringwriter));
return stringwriter.toString();

and it works perfectly.

But now I'd like to transform individual arbitrary Nodes from that Document into strings. A DOMSource constructor accepts Node pointers just the same as it accepts a Document (and in fact Document is just a subclass of Node, so it's the same API as far as I can tell). So passing in an individual Node in the place of "theXMLDocument" in the snippet above works great... until we get to the Thing.

At that point, transform() throws an exception:

java.lang.RuntimeException: Namespace for prefix 'CustomAttr' has not been declared.
    at com.sun.org.apache.xml.internal.serializer.SerializerBase.getNamespaceURI(Unknown Source)
    at com.sun.org.apache.xml.internal.serializer.SerializerBase.addAttribute(Unknown Source)
    at com.sun.org.apache.xml.internal.serializer.ToUnknownStream.addAttribute(Unknown Source)
    ......

That makes sense. (The "com.sun.org.apache" is weird to read, but whatever.) It makes sense, because the namespace for the custom attribute was declared at the root node, but now the transformer is starting at a child node and can't see the declarations "above" it in the tree. So I think I understand the problem, or at least the symptom, but I'm not sure how to solve it though.

  • If this were a String-to-Document conversion, we'd be using a DocumentBuilderFactory instance and could call .setNamespaceAware(false), but this is going in the other direction.

  • None of the available properties for transformer.setOutputProperty() affect the namespaceURI lookup, which makes sense.

  • There is no such corresponding setInputProperty or similar function.

  • The input parser wasn't namespace aware, which is how the "upstream" code got as far as creating its Document to hand to us. I don't know how to hand that particular status flag on to the transforming code, which is what I really would like to do, I think.

  • I believe it's possible to (somehow) add a xmlns:CustomAttr="http://BlitherBlither" attribute to the Thing node, the same as the root MainTag had. But at that point the output is no longer identical XML to what was read in, even if it "means" the same thing, and the text strings are eventually going to be compared in the future. We wouldn't know if it were needed until the exception got thrown, then we could add it and try again... ick. For that matter, changing the Node would alter the original Document, and this really ought to be a read-only operation.

Advice? Is there some way of telling the Transformer, "look, don't stress your dimwitted little head over whether the output is legit XML in isolation, it's not going to be parsed back in on its own (but you don't know that), just produce the text and let us worry about its context"?

like image 235
Ti Strga Avatar asked Jan 25 '13 18:01

Ti Strga


1 Answers

Given your quoted error message "Namespace for prefix 'CustomAttr' has not been declared.", I'm assuming that your pseudo code is along the lines of:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<MainTag xmlns="http://BlahBlahBlah" xmlns:CustomAttr="http://BlitherBlither">
    .... 18 blarzillion lines of XML ....
    <Thing CustomAttr:attributeName="borkborkbork" ... />
    .... another 27 blarzillion lines ....
</MainTag>

With that assumption, here's my suggestion: So you want to extract the "Thing" node from the "big" XML. The standard approach is to use a little XSLT to do that. You prepare the XSL transformation with:

Transformer transformer = transformerFactory.newTransformer(new StreamSource(new File("isolate-the-thing-node.xslt")));
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setParameter("elementName", stringWithCurrentThing);    // parameterize transformation for each Thing
...

EDIT: @Ti, please note the parameterization instruction above (and below in the xslt).

The file 'isolate-the-thing-node.xslt' could be a flavour of the following:

<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:custom0="http://BlahBlahBlah"
    xmlns:custom1="http://BlitherBlither"
    version="1.0">
    <xsl:param name="elementName">to-be-parameterized</xsl:param>
    <xsl:output encoding="utf-8" indent="yes" method="xml" omit-xml-declaration="no" />

    <xsl:template match="/*" priority="2" >
            <!--<xsl:apply-templates select="//custom0:Thing" />-->
            <!-- changed to parameterized selection: -->
            <xsl:apply-templates select="custom0:*[local-name()=$elementName]" />
    </xsl:template>

    <xsl:template match="node() | @*" priority="1">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*" />
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Hope that gets you over the "Thing" thing :)

like image 136
marty Avatar answered Nov 01 '22 22:11

marty