Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How count leaf elements in xml file in java

Tags:

java

xml

I want to count all leaf elements in an xml file in Java. Suppose my xml structure is like the example below, I want to count all name and id elements in this file. How do I do this?

Xml Sample:

<set>
 <employee>
    <name> </name>
    <id></id>
 </employee> 
 <employee>
     <name> </name>
     <id></id>
  </employee>
</set> 

Attempted Java Code:

try {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(file.toFile());
    Element root = document.getDocumentElement();
    if (!root.hasChildNodes()) {
        paths.add(file);
    } else {
        System.out.println("Element Name in: "+file.getFileName());
        System.out.println("Root element: " + "Total count: " + root.getChildNodes().getLength());
        for (int i = 0; i < root.getChildNodes().getLength(); i++) {
            Node node = root.getChildNodes().item(i);
            if (node.getChildNodes().getLength() != 0) {
                System.out.println("name: " + node.getNodeName() + " size:"+ node.getChildNodes().getLength());
            }
        }
    }
} catch (ParserConfigurationException | SAXException e) {
    e.printStackTrace();
}
like image 954
Hadi J Avatar asked Jun 22 '26 19:06

Hadi J


2 Answers

NOTE: This answer is about counting the number of elements with particular known names (name and id). Question has been changed to request counting leaf elements, which this answer does not cover.

To perform a full depth-first search of an XML document, you have a choice of methods.

If you only need to perform the search, and nothing else, then a StAX parser is the best choice, for both performance and memory footprint.

Otherwise a DOM parser is likely your best choice.

If you don't want to traverse the XML tree yourself, you can use XPath to do it for you.

Here is an example of all three, with test code:

private static int countUsingStAX(String xml) throws XMLStreamException {
    int count = 0;
    XMLInputFactory factory = XMLInputFactory.newFactory();
    XMLStreamReader reader = factory.createXMLStreamReader(new StringReader(xml));
    while (reader.hasNext()) {
        int event = reader.next();
        if (event == XMLStreamConstants.START_ELEMENT) {
            String name = reader.getLocalName();
            if (name.equals("name") || name.equals("id"))
                count++;
        }
    }
    reader.close();
    return count;
}

private static int countUsingDOM(String xml) throws Exception {
    int count = 0;
    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
    Document document = domBuilder.parse(new InputSource(new StringReader(xml)));
    Node node = document.getDocumentElement();
    while (node != null) {
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            String name = node.getNodeName();
            if (name.equals("name") || name.equals("id"))
                count++;
        }
        if (node.getFirstChild() != null)
            node = node.getFirstChild();
        else {
            while (node != null && node.getNextSibling() == null)
                node = node.getParentNode();
            if (node != null)
                node = node.getNextSibling();
        }
    }
    return count;
}

private static int countUsingXPath(String xml) throws XPathException {
    String xpathExpr = "//*[self::name or self::id]";
    XPathFactory factory = XPathFactory.newInstance();
    XPath xPath = factory.newXPath();
    NodeList nodeList = (NodeList)xPath.evaluate(xpathExpr,
                                                 new InputSource(new StringReader(xml)),
                                                 XPathConstants.NODESET);
    return nodeList.getLength();
}

public static void main(String[] args) throws Exception {
    String xml = "<set>\r\n" +
                 " <employee>\r\n" +
                 "    <name> </name>\r\n" +
                 "    <id></id>\r\n" +
                 " </employee>\r\n" +
                 " <employee>\r\n" +
                 "     <name> </name>\r\n" +
                 "     <id></id>\r\n" +
                 "  </employee>\r\n" +
                 "</set>";
    System.out.println(countUsingStAX(xml));
    System.out.println(countUsingDOM(xml));
    System.out.println(countUsingXPath(xml));
}

All three print the number 4.

DOM traversal could also be done using recursion, e.g. using getChildNodes().

like image 195
Andreas Avatar answered Jun 25 '26 11:06

Andreas


XPath is the best way to do this. You can use two slashes in an XPath expression to search at all levels:

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//name|//id", document, 
    XPathConstants.NODESET);
int count = nodes.getLength();

Update:

Now that the question is asking how to count leaf elements regardless of the element name, the XPath expression should be:

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//*[not(*)]", document, 
    XPathConstants.NODESET);
int count = nodes.getLength();
like image 32
VGR Avatar answered Jun 25 '26 09:06

VGR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!