I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):
<InputStringList>
<InputString></InputString>
<InputString>000</InputString>
<InputString>111</InputString>
<InputString>01001</InputString>
<InputString>1011011</InputString>
<InputString>1011000</InputString>
<InputString>01010</InputString>
<InputString>1010101110</InputString>
</InputStringList>
I extract my strings from the list using:
//Get input strings to be validated
xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0);
xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING);
for (int j = 0; j < xmlNodeList.getLength(); j++) {
//Add input string to list
if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue());
} else {
arrInputStrings.add("");
}
}
How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.
Thank you in advance for your time.
if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
nodeValue
shouldn't be null; it would be firstChild
itself that might be null and should be checked for:
Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());
However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.
What you really want is the textContent
property from DOM Level 3 Core, which will give you all the text inside the element, however contained.
arrInputStrings.add(xmlNodeList.item(j).getTextContent());
This is available in Java 1.5 onwards.
You could use a library like jOOX to generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:
List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
.find(XML_INPUT_STRING)
.texts();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With