How to remove all inline styles and other attributes(class,onclick) from html elements using Jsoup?
Sample Input :
<div style="padding-top:25px;" onclick="javascript:alert('hi');">
This is a sample div <span class='sampleclass'> This is a sample span </span>
</div>
Sample Output :
<div>This is a sample div <span> This is a sample span </span> </div>
My Code (Is this is a right way or any other better approach is there?)
Document doc = Jsoup.parse(html);
Elements el = doc.getAllElements();
for (Element e : el) {
Attributes at = e.attributes();
for (Attribute a : at) {
e.removeAttr(a.getKey());
}
}
Yes, one method is indeed to iterate through the elements and call removeAttr();
An alternative method using jsoup is to make use of the Whitelist
class (see docs), which can be used with the Jsoup.clean()
function to remove any non-specified tags or attributes from the document.
For example:
String html = "<html><head></head><body><div style='padding-top:25px;' onclick='javascript.alert('hi');'>This is a sample div <span class='sampleclass'>This is a simple span</span></div></body></html>";
Whitelist wl = Whitelist.simpleText();
wl.addTags("div", "span"); // add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);
Will result in the following output:
11-05 19:56:39.302: I/System.out(414): <div>
11-05 19:56:39.302: I/System.out(414): This is a sample div
11-05 19:56:39.302: I/System.out(414): <span>This is a simple span</span>
11-05 19:56:39.302: I/System.out(414): </div>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With