Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between jsoup.text() and jsoup.body().text()

Tags:

java

jsoup

Using Jsoup library, I'm trying to get the content(only text) from an HTML string. There are two methods which can give me the content :

Jsoup.parse(htmlString).body().text()
Jsoup.parse(htmlString).text()

I know that the first method will return only text of the body. What does the second method return? Which one is better for my usage?

Note : According to documentation, text method is used to set the text of the body of document

like image 457
balajiprasadb Avatar asked Feb 06 '23 04:02

balajiprasadb


2 Answers

Each Element has the method text()

public java.lang.String text() Gets the combined text of this element and all its children. Whitespace is normalized and trimmed.

All elements, which can contain text-nodes (node.nodeName() returns #text), are supposed to be part of the body , except for the <title> tag (the <script> and <style> tags have child-nodes with node name #data).

So a valid page will return the same text for document.body().text() and document.text(), as long as the title tag is not set in the head, otherwise document.text() will additionally contain the title text.

like image 145
Frederic Klein Avatar answered Apr 15 '23 17:04

Frederic Klein


The second line includes the text from the entire HTML document including the head, title and body, whilst the first only includes text from the body.

like image 27
William Greenly Avatar answered Apr 15 '23 17:04

William Greenly