I have the following html, using Jsoup I'm trying to extract the text in the p section which does not have any attributes (the text "Some text 2" and not "Some text 1").
<div id="intro">
    <h1 class="some class">
    <p id="some_id">
        Some text 1
    </p>
    <p>
        Some text 2
    </p>
</div> 
I tried using the following Jsoup expression:
div[id=intro] > p:not(:has(@*))
But it doesn't work. Thanks for your help.
I think you can use the JSOUP CSS selector p:not([^]), which would select any p that does not match having an attribute starting with anything. 
String html = "<div id=\"intro\">"
        + "<h1 class=\"some class\">"
        + "<p id=\"some_id\">"
        +   "Some text 1"
        + "</p>"
        + "<p name=\"some_name\">"
        +   "Some text A"
        + "</p>"
        + "<p data>"
        +   "Some text B"
        + "</p>"
        +"<p>"
        +   "Some text 2"
        +"</p>"
        +"</div> ";
Document doc = Jsoup.parse(html);
Elements els = doc.select("p:not([^])");
for (Element el:els){
    System.out.println(el.text());
}
the above example will only print out
Some text 2
because only this p element has no attributes.
Note that the selector p[^] will pick all p elements that do have an attribute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With