Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling Empty Nodes Using Java DOM

I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):

    <InputStringList>
        <InputString></InputString>
        <InputString>000</InputString>
        <InputString>111</InputString>
        <InputString>01001</InputString>
        <InputString>1011011</InputString>
        <InputString>1011000</InputString>
        <InputString>01010</InputString>
        <InputString>1010101110</InputString>
    </InputStringList>

I extract my strings from the list using:

//Get input strings to be validated
xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0);
xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING);
for (int j = 0; j < xmlNodeList.getLength(); j++) {

    //Add input string to list
    if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
        arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue());

    } else {
        arrInputStrings.add("");

    }
}

How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.

Thank you in advance for your time.

like image 325
MysteryMoose Avatar asked Jan 22 '23 05:01

MysteryMoose


2 Answers

if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {

nodeValue shouldn't be null; it would be firstChild itself that might be null and should be checked for:

Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());

However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.

What you really want is the textContent property from DOM Level 3 Core, which will give you all the text inside the element, however contained.

arrInputStrings.add(xmlNodeList.item(j).getTextContent());

This is available in Java 1.5 onwards.

like image 139
bobince Avatar answered Jan 23 '23 20:01

bobince


You could use a library like jOOX to generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:

List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
                                    .find(XML_INPUT_STRING)
                                    .texts();
like image 44
Lukas Eder Avatar answered Jan 23 '23 19:01

Lukas Eder