Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get text from this html tag by using jsoup?

Tags:

java

html

jsoup

I meet a position when i using jsoup to extracting data. The data like this:

This is a <strong>strong</strong> number <date>2013</date>

I want to get data like this: This is a number

How can I do that? Can anyone help me?

like image 863
user2269351 Avatar asked Apr 11 '13 10:04

user2269351


2 Answers

You can parse the html into a Document, select the body-Element and get its text.

Example:

Document doc = Jsoup.parse("This is a <strong>strong</strong> number <date>2013</date>");

String ownText = doc.body().ownText();
String text = doc.body().text();

System.out.println(ownText);
System.out.println(text);

Output:

This is a number  
This is a strong number 2013
like image 75
ollo Avatar answered Sep 28 '22 18:09

ollo


This should answer your question :

public String escapeHtml(String source) {
    Document doc = Jsoup.parseBodyFragment(source);
    Elements elements = doc.select("b");
    for (Element element : elements) {
        element.replaceWith(new TextNode(element.toString(),""));
    }
    return Jsoup.clean(doc.body().toString(), new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
}

Jsoup - Howto clean html by escaping not deleting the unwanted html?

like image 22
Mehdi Karamosly Avatar answered Sep 28 '22 17:09

Mehdi Karamosly