Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sanitize HTML code to prevent XSS attacks in Java or JSP?

Tags:

java

jsp

xss

I'm writing a servlet-based application in which I need to provide a messaging system. I'm in a rush, so I choose CKEditor to provide editing capabilities, and I currently insert the generated html directly in the web page displaying all messages (messages are stored in a MySQL databse, fyi). CKEditor already filters HTML based on a white list, but a user can still inject malicious code with a POST request, so this is not enough.

A good library already exists to prevent XSS attacks by filtering HTML tags, but it's written in PHP: HTML Purifier

So, is there a similar mature library that can be used in Java ? A simple string replacement based on a white list doesn't seem to be enough, since I'd like to filter malformed tags too (which could alter the design of the page on which the message is displayed).

If there isn't, then how should I proceed? An XML parser seems overkill.

Note: There are a lot of questions about this on SO, but all the answers refer to filter ALL HTML tags: I want to keep valid formatting tags.

like image 651
KeatsPeeks Avatar asked Aug 27 '10 18:08

KeatsPeeks


People also ask

How do you disinfect HTML?

Sanitize a string immediatelysetHTML() is used to sanitize a string of HTML and insert it into the Element with an id of target . The script element is disallowed by the default sanitizer so the alert is removed.

Does HTML encoding prevent XSS?

No. Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.

What is XSS sanitization?

Summary. xss-sanitize allows you to accept html from untrusted sources by first filtering it through a white list. The white list filtering is fairly comprehensive, including support for css in style attributes, but there are limitations enumerated below.

How can XSS be prevented?

In general, effectively preventing XSS vulnerabilities is likely to involve a combination of the following measures: Filter input on arrival. At the point where user input is received, filter as strictly as possible based on what is expected or valid input. Encode data on output.


2 Answers

I'd recommend using Jsoup for this. Here's an extract of relevance from its site.

Sanitize untrusted HTML

Problem

You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.

Solution

Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.

String unsafe = 
      "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
      // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>

Jsoup offers more advantages than that as well. See also Pros and Cons of HTML parsers in Java.

like image 108
BalusC Avatar answered Oct 14 '22 08:10

BalusC


You should use AntiSamy. (That's what I did)

like image 26
Thierry-Dimitri Roy Avatar answered Oct 14 '22 07:10

Thierry-Dimitri Roy