Using org.xml.sax.helpers.DefaultHandler
, can you determine whether you're at a leaf node within endElement(String, String, String)
?
Or do you need to use a DOM parser to determine this?
Let's start with some basic definitions:
An XML document is an ordered, labeled tree. Each node of the tree is an XML element and is written with an opening and closing tag.
( from here ). The great part about that: it means that XML files have a very regular, simple structure. For example, the definition of leaf node is just that: a node that doesn't have any children.
Now: that endElement()
method is invoked whenever a SAX parser encounters a closing tag of a node. Assuming that your XML has valid content, that also means that the parser gave you a corresponding startElement()
call before!
In other words: all the information you need to determine if you are "ending" a leaf node are available to you:
Take this example:
<outer>
<inner/>
</outer>
This will lead to such a sequence of events/callbacks:
So, "obviously", when your parser remembers the history of events, determining which of inner
or outer
is a leaf node is straight forward!
Thus, the answer is: no, you don't need a DOM parser. In the end, the DOM is constructed from the very same information anyway! If the DOM parser can deduce the "scope" of objects, so can your SAX parser.
But just for the record: you still need to carefully implement your data structures that keep track of "started", "open" and "ended" tags, for example to correctly determine that this one:
<outer> <inner> <inner/> </inner> </outer>
represents two non-leafs (outer
and the first inner
), and one leaf node (the inner inner
).
From an implementation standpoint, you can do this using only a single boolean flag, tracking whether or not an element is a potential leaf node. The flag will always be true whenever you enter an element, but only the first actual leaf node ending element will have leaf node logic applied to it.
This flag can be reset repeatedly whenever a startElement is applied.
If multiple leaf nodes are at the same level, you will get consecutive isLeafNode
flags set.
The logical reasoning behind this is can be viewed if we imagine the XML as a stack. startElements
are pushes onto the stack. The first pop off the stack after a push will be a leaf node. Subsequent pops would not be leafs, but this is reset if another push is performed.
private boolean isLeafNode = false;
public void startElement(String uri, String localName, String qName, Attributes attributes) {
isLeafNode = true;
}
public void endElement(String uri, String localName, String qName) {
if(isLeafNode) {
//do leaf node logic
}
isLeafNode = false;
}
So, for the following XML, the leaf nodes are as follows.
<foo>
<bar>Leaf</bar>
<baz>
<bop>Leaf</bop>
<beep>Leaf</beep>
<blip>
<moo>Leaf</moo>
</blip>
</baz>
</foo>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With