Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set namespace aware to false?

I'm trying to parse some XML with EclipseLink MOXy, and it's failing on the line with the xsi attribute. If I remove this, it parses fine. However, I've got 100GiB of XML to wade through and changing the source files is not an option.

It's been suggested that if I can set XmlParser.setNamespaceAware(false) then it should work - but I've got no idea how to configure this, without breaking right into the guts of MOXy.

<record>
<header>
    <!-- citation-id: 14404534; type: journal_article; -->
    <identifier>info:doi/10.1007/s10973-004-0435-2</identifier>
    <datestamp>2009-04-28</datestamp>
    <setSpec>J</setSpec>
    <setSpec>J:1007</setSpec>
    <setSpec>J:1007:2777</setSpec>
</header>
<metadata>
    <crossref xmlns="http://www.crossref.org/xschema/1.0"
        xsi:schemaLocation="http://www.crossref.org/xschema/1.0 http://www.crossref.org/schema/unixref1.0.xsd">
        <journal>
            <journal_metadata language="en">
[...]

The exception I get when the xsi: prefix is present is:

org.springframework.oxm.UnmarshallingFailureException: JAXB unmarshalling exception; nested exception is javax.xml.bind.UnmarshalException
 - with linked exception:
[Exception [EclipseLink-25004] (Eclipse Persistence Services - 2.4.0.v20120608-r11652): org.eclipse.persistence.exceptions.XMLMarshalException
Exception Description: An error occurred unmarshalling the document
Internal Exception: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[13,107]
Message: http://www.w3.org/TR/1999/REC-xml-names-19990114#AttributePrefixUnbound?crossref&xsi:schemaLocation&xsi]
like image 710
EngineerBetter_DJ Avatar asked Nov 16 '12 12:11

EngineerBetter_DJ


People also ask

What is setNamespaceAware?

setNamespaceAware(boolean awareness) Specifies that the parser produced by this code will provide support for XML namespaces. void. setSchema(Schema schema) Set the Schema to be used by parsers created from this factory.

What is namespace aware XML parser?

A namespace-aware parser does add a couple of checks to the normal well-formedness checks that a parser performs. Specifically, it checks to see that all prefixes are mapped to URIs.

How do you declare a namespace in Java?

To declare the default element Namespace, use the empty string as the prefix. Note that there is an asymmetry in this library: getPrefix will not return the "" prefix, even if you have declared a default element namespace. To check for a default namespace, you have to look it up explicitly using getURI .

How do you create a namespace in XML?

XML Namespaces - The xmlns Attribute When using prefixes in XML, a namespace for the prefix must be defined. The namespace can be defined by an xmlns attribute in the start tag of an element. The namespace declaration has the following syntax. xmlns:prefix="URI".


2 Answers

There currently isn't an option in EclipseLink JAXB (MOXy) to tell it to ignore namespaces. But there is an approach you can use by leveraging a StAX parser.

Demo

You can create a StAX XMLStreamReader on the XML input that is not namespace aware and then have MOXy unmarshal from that.

package forum13416681;

import javax.xml.bind.*;
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Foo.class);

        XMLInputFactory xif = XMLInputFactory.newFactory();
        xif.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
        StreamSource source = new StreamSource("src/forum13416681/input.xml");
        XMLStreamReader xsr = xif.createXMLStreamReader(source);

        Unmarshaller unmarshaller = jc.createUnmarshaller();
        Foo root = (Foo) unmarshaller.unmarshal(xsr);

        Marshaller marshaller = jc.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
        marshaller.marshal(root, System.out);
    }

}

Java Model (Foo)

package forum13416681;

import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement
public class Foo {

    private String bar;

    public String getBar() {
        return bar;
    }

    public void setBar(String bar) {
        this.bar = bar;
    }

}

Input (input.xml)

Below is a simplified version of the XML from your question. Note that this XML is not properly namespace qualified since it is missing the namespace declaration for the xsi prefix.

<?xml version="1.0" encoding="UTF-8"?>
<foo xsi:schemaLocation="http://www.crossref.org/xschema/1.0 http://www.crossref.org/schema/unixref1.0.xsd">
    <bar>Hello World</bar>
</foo>

Output

Below is the output from running the demo code.

<?xml version="1.0" encoding="UTF-8"?>
<foo>
   <bar>Hello World</bar>
</foo>
like image 55
bdoughan Avatar answered Sep 22 '22 19:09

bdoughan


Rather than disabling namespace awareness altogether, you may be able to use a StAX-implementation-specific mechanism to declare the xsi prefix in advance, then parse with namespaces enabled. For example, with Woodstox you can say:

import javax.xml.bind.*;
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;
import com.ctc.wstx.sr.BasicStreamReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance("com.example");

        XMLInputFactory xif = XMLInputFactory.newFactory();
        StreamSource source = new StreamSource("input.xml");
        XMLStreamReader xsr = xif.createXMLStreamReader(source);
        ((BasicStreamReader)xsr).getInputElementStack().addNsBinding(
               "xsi", "http://www.w3.org/2001/XMLSchema-instance");

and then create the unmarshaller and unmarshal the xsr as in Blaise's answer. While this obviously ties you to one specific StAX implementation, it means that you don't have to modify your existing JAXB model classes if they expect the <crossref> element and its children to be in the http://www.crossref.org/xschema/1.0 namespace.

like image 32
Ian Roberts Avatar answered Sep 24 '22 19:09

Ian Roberts