Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitize/Rewrite HTML on the Client Side

I need to display external resources loaded via cross domain requests and make sure to only display "safe" content.

Could use Prototype's String#stripScripts to remove script blocks. But handlers such as onclick or onerror are still there.

Is there any library which can at least

  • strip script blocks,
  • kill DOM handlers,
  • remove black listed tags (eg: embed or object).

So are any JavaScript related links and examples out there?

like image 288
aemkei Avatar asked Nov 17 '08 13:11

aemkei


People also ask

What does sanitize HTML do?

HTML sanitization is an OWASP-recommended strategy to prevent XSS vulnerabilities in web applications. HTML sanitization offers a security mechanism to remove unsafe (and potentially malicious) content from untrusted raw HTML strings before presenting them to the user.

Why should we sanitize in the client?

Client side sanitation/validation should be used for few reasons: easier and faster way to tell the non-malicious user what he did wrong. decrease the number of times non-malicious user communicate with your server (in case of errors)

How do you disinfect text in HTML?

Sanitize a string immediatelysetHTML() is used to sanitize a string of HTML and insert it into the Element with an id of target . The script element is disallowed by the default sanitizer so the alert is removed.


2 Answers

Update 2016: There is now a Google Closure package based on the Caja sanitizer.

It has a cleaner API, was rewritten to take into account APIs available on modern browsers, and interacts better with Closure Compiler.


Shameless plug: see caja/plugin/html-sanitizer.js for a client side html sanitizer that has been thoroughly reviewed.

It is white-listed, not black-listed, but the whitelists are configurable as per CajaWhitelists


If you want to remove all tags, then do the following:

var tagBody = '(?:[^"\'>]|"[^"]*"|\'[^\']*\')*';  var tagOrComment = new RegExp(     '<(?:'     // Comment body.     + '!--(?:(?:-*[^->])*--+|-?)'     // Special "raw text" elements whose content should be elided.     + '|script\\b' + tagBody + '>[\\s\\S]*?</script\\s*'     + '|style\\b' + tagBody + '>[\\s\\S]*?</style\\s*'     // Regular name     + '|/?[a-z]'     + tagBody     + ')>',     'gi'); function removeTags(html) {   var oldHtml;   do {     oldHtml = html;     html = html.replace(tagOrComment, '');   } while (html !== oldHtml);   return html.replace(/</g, '&lt;'); } 

People will tell you that you can create an element, and assign innerHTML and then get the innerText or textContent, and then escape entities in that. Do not do that. It is vulnerable to XSS injection since <img src=bogus onerror=alert(1337)> will run the onerror handler even if the node is never attached to the DOM.

like image 174
Mike Samuel Avatar answered Sep 23 '22 15:09

Mike Samuel


The Google Caja HTML sanitizer can be made "web-ready" by embedding it in a web worker. Any global variables introduced by the sanitizer will be contained within the worker, plus processing takes place in its own thread.

For browsers that do not support Web Workers, we can use an iframe as a separate environment for the sanitizer to work in. Timothy Chien has a polyfill that does just this, using iframes to simulate Web Workers, so that part is done for us.

The Caja project has a wiki page on how to use Caja as a standalone client-side sanitizer:

  • Checkout the source, then build by running ant
  • Include html-sanitizer-minified.js or html-css-sanitizer-minified.js in your page
  • Call html_sanitize(...)

The worker script only needs to follow those instructions:

importScripts('html-css-sanitizer-minified.js'); // or 'html-sanitizer-minified.js'  var urlTransformer, nameIdClassTransformer;  // customize if you need to filter URLs and/or ids/names/classes urlTransformer = nameIdClassTransformer = function(s) { return s; };  // when we receive some HTML self.onmessage = function(event) {     // sanitize, then send the result back     postMessage(html_sanitize(event.data, urlTransformer, nameIdClassTransformer)); }; 

(A bit more code is needed to get the simworker library working, but it's not important to this discussion.)

Demo: https://dl.dropbox.com/u/291406/html-sanitize/demo.html

like image 42
Jeffery To Avatar answered Sep 22 '22 15:09

Jeffery To