I have a user-submitted string that contains HTML content such as
"<p></p><div></div><p>Hello<br/>world</p><p></p>"
I would like to transform this string such that empty tag pairs are removed (but empty tags like <br/>
are retained). For example, the result of this transformation should convert the string above to
"<p>Hello<br/>world</p>"
I'd like to use JSoup to do this, as I already have this on my classpath, and it would be easiest for me to perform this transformation on the server-side.
Here is an example that does just that (using JSoup):
String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>";
Document doc = Jsoup.parse(html);
for (Element element : doc.select("*")) {
if (!element.hasText() && element.isBlock()) {
element.remove();
}
}
System.out.println(doc.body().html())
The output of the code above is what you are looking for:
<p>Hello<br />world</p>
Not really familiar with jsoup, but you could do this with a simple regex replace:
String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>";
html = html.replaceAll("<([^>]*)></\\1>", "");
Although with a full parser you could probably just drop empty content during processing, depending on what you're eventually going to do with it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With