Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Jsoup to select an HTML element with no class

Consider an html document like this one

<div>
    <p>...</p>
    <p>...</p>
    ...
    <p class="random_class_name">...</p>
    ...
</div>

How could we select all of the p elements, but excluding the p element with random_class_name class?

like image 767
wginsberg Avatar asked Jul 15 '15 20:07

wginsberg


People also ask

What is jsoup element?

A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.

Can we use XPath in jsoup?

With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.

What does jsoup parse do?

jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.


2 Answers

Elements ps = body.select("p:not(.random_class_name)");

You can use the pseudo selector :not

If the class name is not known, you still can use a similar expression:

Elements ps = body.select("p:not([class])");

In the second example I use the attribute selector [], in the first the normal syntax for classes.

See the Jsoup docu about css selectors

like image 58
luksch Avatar answered Nov 03 '22 09:11

luksch


Document doc = Jsoup.parse(htmlValue);
    Elements pElements = doc.select("p");         
    for (Element element : pElements) {
        String class = element.attr("class");
        if(class == null){
            //.....
        }else{
             //.....
        }
    }
like image 38
ooozguuur Avatar answered Nov 03 '22 10:11

ooozguuur