I meet a position when i using jsoup to extracting data. The data like this:
This is a <strong>strong</strong> number <date>2013</date>
I want to get data like this: This is a number
How can I do that? Can anyone help me?
You can parse the html into a Document
, select the body
-Element and get its text.
Example:
Document doc = Jsoup.parse("This is a <strong>strong</strong> number <date>2013</date>");
String ownText = doc.body().ownText();
String text = doc.body().text();
System.out.println(ownText);
System.out.println(text);
Output:
This is a number
This is a strong number 2013
This should answer your question :
public String escapeHtml(String source) {
Document doc = Jsoup.parseBodyFragment(source);
Elements elements = doc.select("b");
for (Element element : elements) {
element.replaceWith(new TextNode(element.toString(),""));
}
return Jsoup.clean(doc.body().toString(), new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
}
Jsoup - Howto clean html by escaping not deleting the unwanted html?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With