Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getNodeName() operation on an XML node returns #text

<person>
<firstname>
<lastname>
<salary>
</person>

This is the XML I am parsing. When I try printing the node names of child elements of person, I get

text

firstname

text

lastname

text

salary

How do I eliminate #text being generated?

Update - Here is my code

try {

    NodeList nl = null;
    int l, i = 0;
    File fXmlFile = new File("file.xml");
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    dbFactory.setValidating(false);
    dbFactory.setIgnoringElementContentWhitespace(true);
    dbFactory.setNamespaceAware(true);
    dbFactory.setIgnoringComments(true);

    dbFactory.setCoalescing(true);


    InputStream in;
    in = new FileInputStream(fXmlFile);
    Document doc = dBuilder.parse(in);
    doc.getDocumentElement().normalize();
    Node n = doc.getDocumentElement();

    System.out.println(dbFactory.isIgnoringElementContentWhitespace());
    System.out.println(n);

    if (n != null && n.hasChildNodes()) {
        nl = n.getChildNodes();

        for (i = 0; i < nl.getLength(); i++) {
            System.out.println(nl.item(i).getNodeName());
        }
    }
} catch (Exception e) {
    e.printStackTrace();
}
like image 670
coder Avatar asked Oct 10 '12 10:10

coder


1 Answers

setIgnoringElementContentWhitespace only works if you use setValidating(true), and then only if the XML file you are parsing references a DTD that the parser can use to work out which whitespace-only text nodes are actually ignorable. If your document doesn't have a DTD it errs on the safe side and assumes that no text nodes can be ignored, so you'll have to write your own code to ignore them as you traverse the child nodes.

like image 176
Ian Roberts Avatar answered Oct 12 '22 12:10

Ian Roberts