I think this question has been asked, but I not found anything.
From the Document
element in Jsoup, how I can traverse for all elements in the HTML content?
I was reading the documentation and I was thinking about using the childNodes()
method, but it only takes the nodes from one leval below (what I understand). I think I can use some recursion with this method, but I want to know if there is a more appropriate/native way to do this.
With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.
By calling the jsoup methods from the JavaScript and Python code, you can parse the webpage or HTML string and transform it into the DOM model, then traverse the DOM and find the required elements.
From Document
(and any Node
subclass), you can use the traverse(NodeVisitor)
method.
For example:
document.traverse(new NodeVisitor() {
public void head(Node node, int depth) {
System.out.println("Entering tag: " + node.nodeName());
}
public void tail(Node node, int depth) {
System.out.println("Exiting tag: " + node.nodeName());
}
});
1) You can select all elements of the document using * selector.
Elements elements = document.body().select("*");
2) For retrieve text of each individually using Element.ownText() method.
for (Element element : elements) {
System.out.println(element.ownText());
}
3) For modify the text of each individually using Element.html(String strHtml). (clears any existing inner HTML in an element, and replaces it with parsed HTML.)
element.html(strHtml);
Hope this will help you. Thank you!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With