Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: Best way to remove Javascript from HTML

What's the best library/approach for removing Javascript from HTML that will be displayed?

For example, take:

<html><body><span onmousemove='doBadXss()'>test</span></body></html>

and leave:

<html><body><span>test</span></body></html>

I see the DeXSS project. But is that the best way to go?

like image 706
mtyson Avatar asked Nov 11 '10 16:11

mtyson


1 Answers

JSoup has a simple method for sanitizing HTML based on a whitelist. Check http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

It uses a whitelist, which is safer then the blacklist approach DeXSS uses. From the DeXSS page:

There are still a number of known XSS attacks that DeXSS does not yet detect.

A blacklist only disallows known unsafe constructions, while a whitelist only allows known safe constructions. So unknown, possibly unsafe constructions will only be protected against with a whitelist.

like image 133
beetstra Avatar answered Oct 13 '22 04:10

beetstra