I want to remove HTML tags from a String. This is easy, I know, I did so:
public String removerTags(String html)
{
return html.replaceAll("\\<(/?[^\\>]+)\\>", " ").replaceAll("\\s+", " ").trim();
}
The problem is that I do not want to remove all the tags .. I want the tag
<span style=\"background-color: yellow\"> (text) </ span>
stay intact in the string ..
I'm using this as a kind of "highlight" in the search for a web application using GWT I'm doing ...
And I need to do this, because if the search finds text that contains some HTML tag (the indexing is done by Lucene), and it is broken, the appendHTML from safeHTMLBuilder are unable to mount a String.
You can do this in a way fairly good?
Hugs.
I strongly suggest you use JSoup for this task. Regular expressions simply aren't well suited for this task imo. And with JSoup this is basically a simple, readable and easily maintainable one-liner!
Have a look at the JSoup.clean
method, and perhaps this article:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With