Difference between jsoup.text() and jsoup.body().text()

Question

Using Jsoup library, I'm trying to get the content(only text) from an HTML string. There are two methods which can give me the content :

Jsoup.parse(htmlString).body().text()
Jsoup.parse(htmlString).text()

I know that the first method will return only text of the body. What does the second method return? Which one is better for my usage?

Note : According to documentation, text method is used to set the text of the body of document

Frederic Klein · Accepted Answer

Each Element has the method text()

public java.lang.String text() Gets the combined text of this element and all its children. Whitespace is normalized and trimmed.

All elements, which can contain text-nodes (node.nodeName() returns #text), are supposed to be part of the body , except for the <title> tag (the <script> and <style> tags have child-nodes with node name #data).

So a valid page will return the same text for document.body().text() and document.text(), as long as the title tag is not set in the head, otherwise document.text() will additionally contain the title text.

William Greenly · Answer

The second line includes the text from the entire HTML document including the head, title and body, whilst the first only includes text from the body.

Difference between jsoup.text() and jsoup.body().text()

Tags:

java

jsoup

balajiprasadb

2 Answers

Frederic Klein

William Greenly

Recent Activity

Donate For Us

Difference between jsoup.text() and jsoup.body().text()

Tags:

java

jsoup

balajiprasadb

2 Answers

Frederic Klein

William Greenly

Related questions

Recent Activity

Donate For Us