Can anyone please explain the difference between the Element
object and Node
object provided in JSoup ?
Which is the best thing to be used in which situation/condition.
A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.
With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.
clean. Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document's body are used.
A node is the generic name for any type of object in the DOM hierarchy.
An element is one specific type of node.
The JSoup class model reflects this:
Since Element extends Node
anything you can do on a Node
, you can do on an Element
too. But Element
provides additional behaviour which makes it easier to use, for example; an Element
has properties such as id
and class
etc which make it easier to find them in a HTML document.
In most cases using Element
(or one of the other subclasses of Document
) will meet your needs and will be easier to code to. I suspect the only scenario in which you might need to fall back to Node
is if there is a specific node type in the DOM for which JSoup does not provide a subclass of Node
.
Here's an example showing the same HTML document inspection using both Node
and Element
:
String html = "<html><head><title>This is the head</title></head><body><p>This is the body</p></body></html>";
Document doc = Jsoup.parse(html);
Node root = doc.root();
// some content assertions, using Node
assertThat(root.childNodes().size(), is(1));
assertThat(root.childNode(0).childNodes().size(), is(2));
assertThat(root.childNode(0).childNode(0), instanceOf(Element.class));
assertThat(((Element) root.childNode(0).childNode(0)).text(), is("This is the head"));
assertThat(root.childNode(0).childNode(1), instanceOf(Element.class));
assertThat(((Element) root.childNode(0).childNode(1)).text(), is("This is the body"));
// the same content assertions, using Element
Elements head = doc.getElementsByTag("head");
assertThat(head.size(), is(1));
assertThat(head.first().text(), is("This is the head"));
Elements body = doc.getElementsByTag("body");
assertThat(body.size(), is(1));
assertThat(body.first().text(), is("This is the body"));
YMMV but I think the Element
form is easier to use and much less error prone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With