I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.
I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").
Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):
String unsanitized = "...<...>..."; // some potentially // dangerous html here on input HtmlSanitizer sat = new HtmlSanitizer(); // sanitizer util class created String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...
Update - the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.
How about that?
Sanitize a string immediatelysetHTML() is used to sanitize a string of HTML and insert it into the Element with an id of target . The script element is disallowed by the default sanitizer so the alert is removed.
No. Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.
Summary. xss-sanitize allows you to accept html from untrusted sources by first filtering it through a white list. The white list filtering is fairly comprehensive, including support for css in style attributes, but there are limitations enumerated below.
The OWASP HTML Sanitizer is a fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS. The existing dependencies are on guava and JSR 305. The other jars are only needed by the test suite.
You can try OWASP Java HTML Sanitizer. It is very simple to use.
PolicyFactory policy = new HtmlPolicyBuilder() .allowElements("a") .allowUrlProtocols("https") .allowAttributes("href").onElements("a") .requireRelNofollowOnLinks() .build(); String safeHTML = policy.sanitize(untrustedHTML);
You could use OWASP ESAPI for Java, which is a security library that is built to do such operations.
Not only does it have encoders for HTML, it also has encoders to perform JavaScript, CSS and URL encoding. Sample uses of ESAPI can be found in the XSS prevention cheatsheet published by OWASP.
You could use the OWASP AntiSamy project to define a site policy that states what is allowed in user-submitted content. The site policy can be later used to obtain "clean" HTML that is displayed back. You can find a sample TinyMCE policy file on the AntiSamy downloads page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With