Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup find element with specific text

I want to select an element with specific text from the HTML using JSoup. The html is

<td style="vertical-align:bottom;text-align:center;width:15%">
<div style="background-color:#FFDD93;font-size:10px;margin:5px auto 0px auto;text-align:left;" class="genbg"><span class="corners-top-subtab"><span></span></span>
    <div><b>Pantry/Catering</b>
        <div>
            <div style="color:#00700B;">&#10003;&nbsp;Pantry Car Avbl
                <br />&#10003;&nbsp;Catering Avbl</div>
        </div>
        <div>
            <div><span>Dinner is served after departure from NZM on 1st day.;</span>...
                <br /><a style="font-size:10px;color:Red;" onClick="expandPost($(this).parent());" href="javascript:void(0);">Read more...</a>
            </div>
            <div style="display:none;">Dinner :2 chapati, rice, dal and chicken curry (NV) and paneer curry in veg &amp;Ice cream.; Breakfast:2 bread slices with jam and butter. ; Omlet of 2 eggs (Non veg),vada and sambar(veg)..; coffee &amp; lime juice</div>
        </div>
    </div><span class="corners-bottom-subtab"><span></span></span>
</div>

I want to find the div element containing the text "Pantry/Catering". I tried

doc.select("div:contains(Pantry/Catering)").first();

But this doesnt seem to work. How can I get this element using Jsoup?

like image 531
tbag Avatar asked Aug 27 '14 01:08

tbag


3 Answers

This should also do the work for you:

doc.selectFirst("div:containsOwn(Pantry/Catering)").text();

Explanation:

selectFirst(selector) - Helps to avoid using select().first()

containsOwn(text) - A pseudo selector to return elements that directly contain the specified text. The text must appear in the found element, not any of its descendants in contrast with contains(text).

Source : https://jsoup.org/apidocs/org/jsoup/select/Selector.html#selectFirst-java.lang.String-org.jsoup.nodes.Element-

like image 95
harshainfo Avatar answered Oct 17 '22 02:10

harshainfo


When I run your code it selects the outer div, while I'm presuming what your looking for is the inner div. The documentation says that it selects the "elements that contains the specified text". In this simple html:

<div><div><b>Pantry/Catering</b></div></div>

The selector div:contains(Pantry/Catering) matches twice because both contain the text 'Pantry/Catering':

<!-- First Match -->
<div><div><b>Pantry/Catering</b></div></div>

<!-- Second Match -->
<div><b>Pantry/Catering</b></div>

The matches are always in that order because jsoup matches from the outside. Therefore .first() always matches the outer div. To extract the inner div you could use .get(1).

Extracting the inner div in full:

doc.select("div:contains(Pantry/Catering)").get(1)
like image 36
Spectre Avatar answered Oct 17 '22 03:10

Spectre


Ok. Figured it out. Had to do something like

doc.select("b:contains(Pantry/Catering)").first().parent().children().get(1).text();

Thanks for the help!

like image 6
tbag Avatar answered Oct 17 '22 03:10

tbag