Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get first-level children of an element in jsoup

Tags:

java

jsoup

In jsoup Element.children() returns all children (descendants) of Element. But, I want the Element's first-level children (direct children).

Which method can I use?

like image 952
user1777220 Avatar asked Apr 17 '13 13:04

user1777220


People also ask

Can we use XPath in jsoup?

With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.

What is Element in jsoup?

A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.

What does jsoup clean do?

clean. Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document's body are used.


2 Answers

Element.children() returns direct children only. Since you get them bound to a tree, they have children too.

If you need the direct children elements without the underlying tree structure then you need to create them as follows

public static void main(String... args) {

    Document document = Jsoup
            .parse("<div><ul><li>11</li><li>22</li></ul><p>ppp<span>sp</span</p></div>");

    Element div = document.select("div").first();
    Elements divChildren = div.children();

    Elements detachedDivChildren = new Elements();
    for (Element elem : divChildren) {
        Element detachedChild = new Element(Tag.valueOf(elem.tagName()),
                elem.baseUri(), elem.attributes().clone());
        detachedDivChildren.add(detachedChild);
    }

    System.out.println(divChildren.size());
    for (Element elem : divChildren) {
        System.out.println(elem.tagName());
    }

    System.out.println("\ndivChildren content: \n" + divChildren);

    System.out.println("\ndetachedDivChildren content: \n"
            + detachedDivChildren);
}

Output

2
ul
p

divChildren content: 
<ul>
 <li>11</li>
 <li>22</li>
</ul>
<p>ppp<span>sp</span></p>

detachedDivChildren content: 
<ul></ul>
<p></p>
like image 135
Vitaly Avatar answered Sep 28 '22 12:09

Vitaly


This should give you the desired list of direct descendants of the parent node:

Elements firstLevelChildElements = doc.select("parent-tag > *");

OR You can also try to retrieve the parent element, get the first child node via child(int index) and then try to retrieve siblings of this child via siblingElements().

This will give you the list of first level children excluding the used child, however you'd have to add the child externally.

Elements firstLevelChildElements = doc.child(0).siblingElements();
like image 43
Vasu Mangal Avatar answered Sep 28 '22 12:09

Vasu Mangal