Is there a way I can use STAX parser to efficiently parse an XML document with multiple lists of objects of different classes (POJO). The exact structure of my XML is as follows (class names are not real)
<?xml version="1.0" encoding="utf-8"?>
<root>
<notes />
<category_alpha>
<list_a>
<class_a_object></class_a_object>
<class_a_object></class_a_object>
<class_a_object></class_a_object>
<class_a_object></class_a_object>
.
.
.
</list_a>
<list_b>
<class_b_object></class_b_object>
<class_b_object></class_b_object>
<class_b_object></class_b_object>
<class_b_object></class_b_object>
.
.
.
</list_b>
</category_alpha>
<category_beta>
<class_c_object></class_c_object>
<class_c_object></class_c_object>
<class_c_object></class_c_object>
<class_c_object></class_c_object>
<class_c_object></class_c_object>
.
.
.
.
.
</category_beta>
</root>
I have been using the STAX Parser i.e. XStream library, link: XStream
It works absolutely fine as long as my XML contains list of one class of objects but I dont know how to handle an XML that contains list of objects of different classes.
Any help would be really appreciated and please let me know if I have not provided enough information or I haven't phrased the question properly.
XMLEventReader reads an XML file as a stream of events. It also provides the methods necessary to parse the XML. The most important methods are: isStartElement(): checks if the current event is a StartElement (start tag) isEndElement(): checks if the current event is an EndElement (end tag)
The SAX parser thus “pushes” events into your handler. With this push model of API you have no control over how and when the parser iterates over the file. Once you start the parser, it iterates all the way until the end, calling your handler for each and every XML event in the input XML document.
Java XML Parser - DOM DOM Parser is the easiest java xml parser to learn. DOM parser loads the XML file into memory and we can traverse it node by node to parse the XML. DOM Parser is good for small files but when file size increases it performs slow and consumes more memory.
StAX is a PULL API, whereas SAX is a PUSH API. It means in case of StAX parser, a client application needs to ask the StAX parser to get information from XML whenever it needs.
You can use Declarative Stream Mapping (DSM) stream parsing library to easily convert complex XML to java class. It uses StAX to parse XML.
I skip getting notes tag and add a field inside class_x_object tags for demostration.
Here is the XML:
<?xml version="1.0" encoding="utf-8"?>
<root>
<notes />
<category_alpha>
<list_a>
<class_a_object>
<fieldA>A1</fieldA>
</class_a_object>
<class_a_object>
<fieldA>A2</fieldA>
</class_a_object>
<class_a_object>
<fieldA>A3</fieldA>
</class_a_object>
</list_a>
<list_b>
<class_b_object>
<fieldB>B1</fieldB>
</class_b_object>
<class_b_object>
<fieldB>B2</fieldB>
</class_b_object>
<class_b_object>
<fieldB>B3</fieldB>
</class_b_object>
</list_b>
</category_alpha>
<category_beta>
<class_c_object>
<fieldC>C1</fieldC>
</class_c_object>
<class_c_object>
<fieldC>C2</fieldC>
</class_c_object>
<class_c_object>
<fieldC>C3</fieldC>
</class_c_object>
</category_beta>
</root>
First of all, you must define the mapping between XML data and your class fields in yaml or JSON format.
Here are the mapping definitions:
result:
type: object
path: /root
fields:
listOfA:
type: array
path: .*class_a_object # path is regex
fields:
fieldOfA:
path: fieldA
listOfB:
type: array
path: .*class_b_object
fields:
fieldOfB:
path: fieldB
listOfC:
type: array
path: .*class_c_object
fields:
fieldOfC:
path: fieldC
Java class that you want to deserialize:
public class Root {
public List<A> listOfA;
public List<B> listOfB;
public List<C> listOfC;
public static class A{
public String fieldOfA;
}
public static class B{
public String fieldOfB;
}
public static class C{
public String fieldOfC;
}
}
Java Code to parse XML:
DSM dsm=new DSMBuilder(new File("path/to/mapping.yaml")).setType(DSMBuilder.TYPE.XML).create(Root.class);
Root root = (Root)dsm.toObject(xmlFileContent);
// write root object as json
dsm.getObjectMapper().writerWithDefaultPrettyPrinter().writeValue(System.out, object);
Here is output:
{
"listOfA" : [ {"fieldOfA" : "A1"}, {"fieldOfA" : "A2"}, {"fieldOfA" : "A3"} ],
"listOfB" : [ {"fieldOfB" : "B1"}, {"fieldOfB" : "B2"}, "fieldOfB" : "B3"} ],
"listOfC" : [ {"fieldOfC" : "C1"}, {"fieldOfC" : "C2"}, {"fieldOfC" : "C3"} ]
}
UPDATE:
As I understand from your comment, you want to read big XML file as a stream. and process data while you are reading the file.
DSM allows you to do process data while you are reading XML.
Declare three different functions to process partial data.
FunctionExecutor processA=new FunctionExecutor(){
@Override
public void execute(Params params) {
Root.A object=params.getCurrentNode().toObject(Root.A.class);
// process aClass; save to db. call service etc.
}
};
FunctionExecutor processB=new FunctionExecutor(){
@Override
public void execute(Params params) {
Root.B object=params.getCurrentNode().toObject(Root.B.class);
// process aClass; save to db. call service etc.
}
};
FunctionExecutor processC=new FunctionExecutor(){
@Override
public void execute(Params params) {
Root.C object=params.getCurrentNode().toObject(Root.C.class);
// process aClass; save to db. call service etc.
}
};
Register function to DSM
DSMBuilder builder = new DSMBuilder(new File("path/to/mapping.yaml")).setType(DSMBuilder.TYPE.XML);
// register function
builder.registerFunction("processA",processA);
builder.registerFunction("processB",processB);
builder.registerFunction("processC",processC);
DSM dsm= builder.create();
Object object = dsm.toObject(xmlContent);
change Mapping file to call registered function
result:
type: object
path: /root
fields:
listOfA:
type: object
function: processA # when 'class_a_object' tag closed processA function will be executed.
path: .*class_a_object # path is regex
fields:
fieldOfA:
path: fieldA
listOfB:
type: object
path: .*class_b_object
function: processB# register function
fields:
fieldOfB:
path: fieldB
listOfC:
type: object
path: .*class_c_object
function: processC# register function
fields:
fieldOfC:
path: fieldC
You could use Java Architecture for XML binding JAXB and Unmarshall using the POJO classes as mentioned below.
Create POJO classes first (I have taken few nodes from your XML file and created the POJO. You can do the similar for the rest). Below is the XML I considered.
<?xml version="1.0" encoding="utf-8"?>
<root>
<category_alpha>
<list_a>
<class_a_object></class_a_object>
<class_a_object></class_a_object>
<class_a_object></class_a_object>
<class_a_object></class_a_object>
</list_a>
<list_b>
<class_b_object></class_b_object>
<class_b_object></class_b_object>
<class_b_object></class_b_object>
<class_b_object></class_b_object>
</list_b>
</category_alpha>
</root>
Below are the POJO classes for Root, category_alpha, list_a, list_b, class_a_object and class_b_object
import java.util.List;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
@XmlRootElement(name = "root")
@XmlAccessorType (XmlAccessType.FIELD)
public class Root {
@XmlElement(name = "category_alpha")
private List<CategoryAlpha> categoryAlphaList = null;
public List<CategoryAlpha> getCategoryAlphaList() {
return categoryAlphaList;
}
public void setCategoryAlphaList(List<CategoryAlpha> categoryAlphaList) {
this.categoryAlphaList = categoryAlphaList;
}
}
Import the similar java imports to the above class here in the following classes.
@XmlRootElement(name = "category_alpha")
@XmlAccessorType (XmlAccessType.FIELD)
public class CategoryAlpha {
@XmlElement(name = "list_a")
private List<ListAClass> list_a_collectionlist = null;
@XmlElement(name = "list_b")
private List<ListBClass> list_b_collectionlist = null;
public List<ListAClass> getList_a_collectionlist() {
return list_a_collectionlist;
}
public void setList_a_collectionlist(List<ListAClass> list_a_collectionlist) {
this.list_a_collectionlist = list_a_collectionlist;
}
public List<ListBClass> getList_b_collectionlist() {
return list_b_collectionlist;
}
public void setList_b_collectionlist(List<ListBClass> list_b_collectionlist) {
this.list_b_collectionlist = list_b_collectionlist;
}
}
@XmlRootElement(name = "list_a")
@XmlAccessorType (XmlAccessType.FIELD)
public class ListAClass {
@XmlElement(name = "class_a_object")
private List<ClassAObject> classAObjectList = null;
public List<ClassAObject> getClassAObjectList() {
return classAObjectList;
}
public void setClassAObjectList(List<ClassAObject> classAObjectList) {
this.classAObjectList = classAObjectList;
}
}
@XmlRootElement(name = "list_b")
@XmlAccessorType (XmlAccessType.FIELD)
public class ListBClass {
@XmlElement(name = "class_b_object")
private List<ClassBObject> classBObjectList = null;
public List<ClassBObject> getClassBObjectList() {
return classBObjectList;
}
public void setClassBObjectList(List<ClassBObject> classBObjectList) {
this.classBObjectList = classBObjectList;
}
}
@XmlRootElement(name = "class_a_object")
@XmlAccessorType (XmlAccessType.FIELD)
public class ClassAObject {
}
@XmlRootElement(name = "class_b_object")
@XmlAccessorType (XmlAccessType.FIELD)
public class ClassBObject {
}
Here is the Main class
import java.io.File;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
public class UnmarshallMainClass {
public static void main(String[] args) throws JAXBException {
JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
// This root object contains all the list of objects you are looking for
Root emps = (Root) jaxbUnmarshaller.unmarshal( new File("sample.xml") );
}
}
By using the getters in the root object and other objects you can retrieve the list of all objects inside the root similar like below.
List<CategoryAlpha> categoryAlphaList = emps.getCategoryAlphaList();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With