I need to parse potentially large XML files, of which the schema is already provided to me in several XSD files, so XML binding is highly favored. I'd like to know if I can use JAXB to parse the file in chunks and if so, how.
DOM Parser is faster than SAX Parser. Best for the larger sizes of files. Best for the smaller size of files. It is suitable for making XML files in Java.
Generally JAXB is quite efficient and you shouldn't care about memory issues unless your application handles XMLs of very large size.
To read XML, first get the JAXBContext . It is entry point to the JAXB API and provides methods to unmarshal, marshal and validate operations. Now get the Unmarshaller instance from JAXBContext . It's unmarshal() method unmarshal XML data from the specified XML and return the resulting content tree.
Java Architecture for XML Binding (JAXB) provides a fast and convenient way to bind XML schemas and Java representations, making it easy for Java developers to incorporate XML data and processing functions in Java applications.
Because code matters, here is a PartialUnmarshaller
who reads a big file into chunks. It can be used that way new PartialUnmarshaller<YourClass>(stream, YourClass.class)
import javax.xml.bind.JAXBContext; import javax.xml.bind.JAXBException; import javax.xml.bind.Unmarshaller; import javax.xml.stream.*; import java.io.InputStream; import java.util.List; import java.util.NoSuchElementException; import java.util.stream.Collectors; import java.util.stream.IntStream; import static javax.xml.stream.XMLStreamConstants.*; public class PartialUnmarshaller<T> { XMLStreamReader reader; Class<T> clazz; Unmarshaller unmarshaller; public PartialUnmarshaller(InputStream stream, Class<T> clazz) throws XMLStreamException, FactoryConfigurationError, JAXBException { this.clazz = clazz; this.unmarshaller = JAXBContext.newInstance(clazz).createUnmarshaller(); this.reader = XMLInputFactory.newInstance().createXMLStreamReader(stream); /* ignore headers */ skipElements(START_DOCUMENT, DTD); /* ignore root element */ reader.nextTag(); /* if there's no tag, ignore root element's end */ skipElements(END_ELEMENT); } public T next() throws XMLStreamException, JAXBException { if (!hasNext()) throw new NoSuchElementException(); T value = unmarshaller.unmarshal(reader, clazz).getValue(); skipElements(CHARACTERS, END_ELEMENT); return value; } public boolean hasNext() throws XMLStreamException { return reader.hasNext(); } public void close() throws XMLStreamException { reader.close(); } void skipElements(int... elements) throws XMLStreamException { int eventType = reader.getEventType(); List<Integer> types = asList(elements); while (types.contains(eventType)) eventType = reader.next(); } }
This is detailed in the user guide. The JAXB download from http://jaxb.java.net/ includes an example of how to parse one chunk at a time.
When a document is large, it's usually because there's repetitive parts in it. Perhaps it's a purchase order with a large list of line items, or perhaps it's an XML log file with large number of log entries.
This kind of XML is suitable for chunk-processing; the main idea is to use the StAX API, run a loop, and unmarshal individual chunks separately. Your program acts on a single chunk, and then throws it away. In this way, you'll be only keeping at most one chunk in memory, which allows you to process large documents.
See the streaming-unmarshalling example and the partial-unmarshalling example in the JAXB RI distribution for more about how to do this. The streaming-unmarshalling example has an advantage that it can handle chunks at arbitrary nest level, yet it requires you to deal with the push model --- JAXB unmarshaller will "push" new chunk to you and you'll need to process them right there.
In contrast, the partial-unmarshalling example works in a pull model (which usually makes the processing easier), but this approach has some limitations in databinding portions other than the repeated part.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With