Consider an html document like this one
<div>
<p>...</p>
<p>...</p>
...
<p class="random_class_name">...</p>
...
</div>
How could we select all of the p
elements, but excluding the p
element with random_class_name
class?
A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.
With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.
jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.
Elements ps = body.select("p:not(.random_class_name)");
You can use the pseudo selector :not
If the class name is not known, you still can use a similar expression:
Elements ps = body.select("p:not([class])");
In the second example I use the attribute selector []
, in the first the normal syntax for classes.
See the Jsoup docu about css selectors
Document doc = Jsoup.parse(htmlValue);
Elements pElements = doc.select("p");
for (Element element : pElements) {
String class = element.attr("class");
if(class == null){
//.....
}else{
//.....
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With