Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can traverse the HTML tree using Jsoup?

I think this question has been asked, but I not found anything.

From the Document element in Jsoup, how I can traverse for all elements in the HTML content?

I was reading the documentation and I was thinking about using the childNodes() method, but it only takes the nodes from one leval below (what I understand). I think I can use some recursion with this method, but I want to know if there is a more appropriate/native way to do this.

like image 976
Renato Dinhani Avatar asked Apr 11 '12 18:04

Renato Dinhani


People also ask

Can we use XPath in jsoup?

With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.

Can jsoup parse JavaScript?

By calling the jsoup methods from the JavaScript and Python code, you can parse the webpage or HTML string and transform it into the DOM model, then traverse the DOM and find the required elements.


2 Answers

From Document (and any Node subclass), you can use the traverse(NodeVisitor) method.

For example:

document.traverse(new NodeVisitor() {
    public void head(Node node, int depth) {
        System.out.println("Entering tag: " + node.nodeName());
    }
    public void tail(Node node, int depth) {
        System.out.println("Exiting tag: " + node.nodeName());
    }
});
like image 186
Vivien Barousse Avatar answered Sep 21 '22 13:09

Vivien Barousse


1) You can select all elements of the document using * selector.

Elements elements = document.body().select("*");

2) For retrieve text of each individually using Element.ownText() method.

for (Element element : elements) {
  System.out.println(element.ownText());
}

3) For modify the text of each individually using Element.html(String strHtml). (clears any existing inner HTML in an element, and replaces it with parsed HTML.)

element.html(strHtml);

Hope this will help you. Thank you!

like image 26
Gaurav Darji Avatar answered Sep 21 '22 13:09

Gaurav Darji