I need to parse a large complex xml and write to a Flat file, can you give some advise?
File size: 500MB Record count: 100K XML structure:
<Msg>
<MsgHeader>
<!--Some of the fields in the MsgHeader need to be map to a java object-->
</MsgHeader>
<GroupA>
<GroupAHeader/>
<!--Some of the fields in the GroupAHeader need to be map to a java object-->
<GroupAMsg/>
<!--50K records-->
<GroupAMsg/>
<GroupAMsg/>
<GroupAMsg/>
</GroupA>
<GroupB>
<GroupBHeader/>
<GroupBMsg/>
<!--50K records-->
<GroupBMsg/>
<GroupBMsg/>
<GroupBMsg/>
</GroupB>
</Msg>
Within Spring Batch, I've written my own stax event item reader implementation that operates a bit more specifically than previously mentioned. Basically, I just stuff elements into a map and then pass them into the ItemProcessor. From there, you're free to transform it into a single object (see CompositeItemProcessor) from the "GatheredElement". Apologies for having a little copy/paste from the StaxEventItemReader, but I don't think it's avoidable.
From here, you're free to use whatever OXM marshaller you'd like, I happen to use JAXB as well.
public class ElementGatheringStaxEventItemReader<T> extends StaxEventItemReader<T> {
private Map<String, String> gatheredElements;
private Set<String> elementsToGather;
...
@Override
protected boolean moveCursorToNextFragment(XMLEventReader reader) throws NonTransientResourceException {
try {
while (true) {
while (reader.peek() != null && !reader.peek().isStartElement()) {
reader.nextEvent();
}
if (reader.peek() == null) {
return false;
}
QName startElementName = ((StartElement) reader.peek()).getName();
if(elementsToGather.contains(startElementName.getLocalPart())) {
reader.nextEvent(); // move past the actual start element
XMLEvent dataEvent = reader.nextEvent();
gatheredElements.put(startElementName.getLocalPart(), dataEvent.asCharacters().getData());
continue;
}
if (startElementName.getLocalPart().equals(fragmentRootElementName)) {
if (fragmentRootElementNameSpace == null || startElementName.getNamespaceURI().equals(fragmentRootElementNameSpace)) {
return true;
}
}
reader.nextEvent();
}
} catch (XMLStreamException e) {
throw new NonTransientResourceException("Error while reading from event reader", e);
}
}
@SuppressWarnings("unchecked")
@Override
protected T doRead() throws Exception {
T item = super.doRead();
if(null == item)
return null;
T result = (T) new GatheredElementItem<T>(item, new HashedMap(gatheredElements));
if(log.isDebugEnabled())
log.debug("Read GatheredElementItem: " + result);
return result;
}
The gathered element class is pretty basic:
public class GatheredElementItem<T> {
private final T item;
private final Map<String, String> gatheredElements;
...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With