Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between JSoup Element and JSoup Node

Tags:

java

jsoup

Can anyone please explain the difference between the Element object and Node object provided in JSoup ?

Which is the best thing to be used in which situation/condition.

like image 426
Abdeali Chandanwala Avatar asked Dec 19 '17 07:12

Abdeali Chandanwala


People also ask

What is Element in jsoup?

A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.

Can we use XPath in jsoup?

With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.

What does jsoup clean do?

clean. Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document's body are used.


1 Answers

A node is the generic name for any type of object in the DOM hierarchy.

An element is one specific type of node.

The JSoup class model reflects this:

  • Node
  • Element

Since Element extends Node anything you can do on a Node, you can do on an Element too. But Element provides additional behaviour which makes it easier to use, for example; an Element has properties such as id and class etc which make it easier to find them in a HTML document.

In most cases using Element (or one of the other subclasses of Document) will meet your needs and will be easier to code to. I suspect the only scenario in which you might need to fall back to Node is if there is a specific node type in the DOM for which JSoup does not provide a subclass of Node.

Here's an example showing the same HTML document inspection using both Node and Element:

String html = "<html><head><title>This is the head</title></head><body><p>This is the body</p></body></html>";
Document doc = Jsoup.parse(html);

Node root = doc.root();

// some content assertions, using Node
assertThat(root.childNodes().size(), is(1));
assertThat(root.childNode(0).childNodes().size(), is(2));
assertThat(root.childNode(0).childNode(0), instanceOf(Element.class));
assertThat(((Element)  root.childNode(0).childNode(0)).text(), is("This is the head"));
assertThat(root.childNode(0).childNode(1), instanceOf(Element.class));
assertThat(((Element)  root.childNode(0).childNode(1)).text(), is("This is the body"));

// the same content assertions, using Element
Elements head = doc.getElementsByTag("head");
assertThat(head.size(), is(1));
assertThat(head.first().text(), is("This is the head"));
Elements body = doc.getElementsByTag("body");
assertThat(body.size(), is(1));
assertThat(body.first().text(), is("This is the body"));

YMMV but I think the Element form is easier to use and much less error prone.

like image 94
glytching Avatar answered Oct 12 '22 22:10

glytching