Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validation using JAXB and Stax to marshal XML document

Tags:

java

xml

jaxb

xsd

stax

I have created an XML schema (foo.xsd) and used xjc to create my binding classes for JAXB. Let's say the root element is collection and I am writing N document objects, which are complex types.

Because I plan to write out large XML files, I am using Stax to write out the collection root element, and JAXB to marshal document subtrees using Marshaller.marshal(JAXBElement, XMLEventWriter). This is the approach recommended by jaxb's unofficial user's guide.

My question is, how can I validate the XML while it's being marshalled? If I bind a schema to the JAXB marshaller (using Marshaller.setSchema()), I get validation errors because I am only marshalling a subtree (it's complaining that it's not seeing the collection root element"). I suppose what I really want to do is bind a schema to the Stax XMLEventWriter or something like that.

Any comments on this overall approach would be helpful. Basically I want to be able to use JAXB to marshal and unmarshal large XML documents without running out of memory, so if there's a better way to do this let me know.

like image 775
bajafresh4life Avatar asked Mar 18 '10 15:03

bajafresh4life


2 Answers

Some Stax implementations seem to be able to validate output. See the following answer to a similar question:

Using Stax2 with Woodstox

like image 123
Christian Semrau Avatar answered Nov 14 '22 03:11

Christian Semrau


You can make your root collection lazy and instantiate items only when the Marshaller calls Iterator.next(). Then a single call to marshal() will produce a huge validated XML. You won't run out of memory, because the beans that are already serialized get collected by GC.

Also, it's OK to return null as a collection element if it needs to be conditionally skipped. There won't be NPE.

The XML schema validator itself seems to consume little memory even on huge XMLs.

See JAXB's ArrayElementProperty.serializeListBody()

import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.AbstractList;
import java.util.ArrayList;
import java.util.List;

import javax.xml.XMLConstants;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.Marshaller;
import javax.xml.bind.SchemaOutputResolver;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.namespace.QName;
import javax.xml.transform.Result;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "TestHuge")
public class TestHuge {

    static final boolean MISPLACE_HEADER = true;

    private static final int LIST_SIZE = 20000;

    static final String HEADER = "Header";

    static final String DATA = "Data";

    @XmlElement(name = HEADER)
    String header;

    @XmlElement(name = DATA)
    List<String> data;

    @XmlAnyElement
    List<Object> content;

    public static void main(final String[] args) throws Exception {

        final JAXBContext jaxbContext = JAXBContext.newInstance(TestHuge.class);

        final Schema schema = genSchema(jaxbContext);

        final Marshaller marshaller = jaxbContext.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.setSchema(schema);

        final TestHuge instance = new TestHuge();

        instance.content = new AbstractList<Object>() {

            @Override
            public Object get(final int index) {
                return instance.createChild(index);
            }

            @Override
            public int size() {
                return LIST_SIZE;
            }
        };

        // throws MarshalException ... Invalid content was found starting with element 'Header'
        marshaller.marshal(instance, new Writer() {

            @Override
            public void write(final char[] cbuf, final int off, final int len) throws IOException {}

            @Override
            public void write(final int c) throws IOException {}

            @Override
            public void flush() throws IOException {}

            @Override
            public void close() throws IOException {}
        });

    }

    private JAXBElement<String> createChild(final int index) {
        if (index % 1000 == 0) {
            System.out.println("serialized so far: " + index);
        }
        final String tag = index == getHeaderIndex(content) ? HEADER : DATA;

        final String bigStr = new String(new char[1000000]);
        return new JAXBElement<String>(new QName(tag), String.class, bigStr);
    }

    private static int getHeaderIndex(final List<?> list) {
        return MISPLACE_HEADER ? list.size() - 1 : 0;
    }

    private static Schema genSchema(final JAXBContext jc) throws Exception {
        final List<StringWriter> outs = new ArrayList<>();
        jc.generateSchema(new SchemaOutputResolver() {

            @Override
            public Result createOutput(final String namespaceUri, final String suggestedFileName)
                                                                                                  throws IOException {
                final StringWriter out = new StringWriter();
                outs.add(out);
                final StreamResult streamResult = new StreamResult(out);
                streamResult.setSystemId("");
                return streamResult;
            }
        });
        final StreamSource[] sources = new StreamSource[outs.size()];
        for (int i = 0; i < outs.size(); i++) {
            final StringWriter out = outs.get(i);
            sources[i] = new StreamSource(new StringReader(out.toString()));
        }
        final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        final Schema schema = sf.newSchema(sources);
        return schema;
    }
}
like image 43
basin Avatar answered Nov 14 '22 04:11

basin