I'm looking for a html sanitizer which I can call per API to sanitise strings which I get from my webapp. Are there some useful easy to use libs available? Does anyone knows maybe one or two?
I don't need something big it just must be able to find unclosed tags and close them.
Sanitize a string immediately setHTML() is used to sanitize a string of HTML and insert it into the Element with an id of target . The script element is disallowed by the default sanitizer so the alert is removed.
HTML sanitization is an OWASP-recommended strategy to prevent XSS vulnerabilities in web applications. HTML sanitization offers a security mechanism to remove unsafe (and potentially malicious) content from untrusted raw HTML strings before presenting them to the user.
HTML sanitization is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated “safe” and desired. HTML sanitization can be used to protect against cross-site scripting (XSS) attacks by sanitizing any HTML code submitted by a user.
https://github.com/OWASP/java-html-sanitizer is now marked ready for production use.
A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS.
You can use prepackaged policies
Sanitizers.FORMATTING.and(Sanitizers.LINKS)
or the tests show how you can configure your own easily:
new HtmlPolicyBuilder()
.allowElements("a")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
or write custom policies to do things like changing h1
s to div
s with a certain class:
new HtmlPolicyBuilder()
.allowElements("h1", "p")
.allowElements(
new ElementPolicy() {
public String apply(String elementName, List<String> attrs) {
attrs.add("class");
attrs.add("header-" + elementName);
return "div";
}
}, "h1"))
JTidy may help you.
The HTML Parser JSoup also supports sanitisation by policy: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer
Apart from JTidy you can also take a look at:
Nekohtml
TagSoup
Getting text in HTmL document
http://roberto.open-lab.com/2009/11/05/a-java-html-sanitizer-also-against-xss/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With