Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I select a direct child of "this element" in JSoup

Tags:

jsoup

If I have an element that looks like this:

<foo>
    <bar> bar text 1 </bar>
    <baz>
        <bar> bar text 2 </bar>
    </baz>
</foo>

And I already have the <foo> element selected, and I want to select the <bar> element that is a direct child of <foo> but not the one that is a child of <baz>, how do I specify that?

Element foo = <that thing above>
foo.select("bar").text();

yields "bar text 1 bar text 2"

what I want is something like

foo.select("this > bar").text();

The question is: how do I specify "this element" in the selector?

Note that the desired bar might not be first -- I need a solution that would also work for:

<foo>
    <baz>
        <bar> bar text 2 </bar>
    </baz>
    <bar> bar text 1 </bar>
</foo>
like image 303
PurpleVermont Avatar asked Apr 28 '15 21:04

PurpleVermont


People also ask

What is a direct child in HTML?

A direct child is a child element that comes immediately below the parent in terms of hierarchy. That is to say, not a grandchild or great-grandchild.

What is element in jsoup?

A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.


2 Answers

I believe you want:

foo.select("> bar").text();

see jsoup Selectors page, Combinators section:

E > F     an F direct child of E
like image 186
Dan Dar3 Avatar answered Jan 01 '23 23:01

Dan Dar3


Use the :root structural pseudo-element to specify "this element". From the Element.select Javadoc, we see select uses "this element as the starting context" and can match "this element, or any of its children"; that is, :root refers to the this element, not the actual document root. The following code demonstrates by placing the second example in some outer tags:

//nest your second sample in some fake outer html body
Element html = (Element)Parser.parseFragment("<html><body><foo>\n" +
                "    <baz>\n" +
                "        <bar> bar text 2 </bar>\n" +
                "    </baz>\n" +
                "    <bar> bar text 1 </bar>\n" +
                "</foo></body></html>", null, "http://example.com").get(0);
Element foo = html.select("foo").first();

System.out.println(foo.select(":root > bar"));

This code prints

<bar>
  bar text 1 
</bar>

correctly skipping the nested bar element.

According to the Jsoup changelog, structural pseudo-element support was added in 1.7.2.

like image 43
Jeffrey Bosboom Avatar answered Jan 01 '23 21:01

Jeffrey Bosboom