I need to display external resources loaded via cross domain requests and make sure to only display "safe" content.
Could use Prototype's String#stripScripts to remove script blocks. But handlers such as onclick
or onerror
are still there.
Is there any library which can at least
embed
or object
).So are any JavaScript related links and examples out there?
HTML sanitization is an OWASP-recommended strategy to prevent XSS vulnerabilities in web applications. HTML sanitization offers a security mechanism to remove unsafe (and potentially malicious) content from untrusted raw HTML strings before presenting them to the user.
Client side sanitation/validation should be used for few reasons: easier and faster way to tell the non-malicious user what he did wrong. decrease the number of times non-malicious user communicate with your server (in case of errors)
Sanitize a string immediatelysetHTML() is used to sanitize a string of HTML and insert it into the Element with an id of target . The script element is disallowed by the default sanitizer so the alert is removed.
Update 2016: There is now a Google Closure package based on the Caja sanitizer.
It has a cleaner API, was rewritten to take into account APIs available on modern browsers, and interacts better with Closure Compiler.
Shameless plug: see caja/plugin/html-sanitizer.js for a client side html sanitizer that has been thoroughly reviewed.
It is white-listed, not black-listed, but the whitelists are configurable as per CajaWhitelists
If you want to remove all tags, then do the following:
var tagBody = '(?:[^"\'>]|"[^"]*"|\'[^\']*\')*'; var tagOrComment = new RegExp( '<(?:' // Comment body. + '!--(?:(?:-*[^->])*--+|-?)' // Special "raw text" elements whose content should be elided. + '|script\\b' + tagBody + '>[\\s\\S]*?</script\\s*' + '|style\\b' + tagBody + '>[\\s\\S]*?</style\\s*' // Regular name + '|/?[a-z]' + tagBody + ')>', 'gi'); function removeTags(html) { var oldHtml; do { oldHtml = html; html = html.replace(tagOrComment, ''); } while (html !== oldHtml); return html.replace(/</g, '<'); }
People will tell you that you can create an element, and assign innerHTML
and then get the innerText
or textContent
, and then escape entities in that. Do not do that. It is vulnerable to XSS injection since <img src=bogus onerror=alert(1337)>
will run the onerror
handler even if the node is never attached to the DOM.
The Google Caja HTML sanitizer can be made "web-ready" by embedding it in a web worker. Any global variables introduced by the sanitizer will be contained within the worker, plus processing takes place in its own thread.
For browsers that do not support Web Workers, we can use an iframe as a separate environment for the sanitizer to work in. Timothy Chien has a polyfill that does just this, using iframes to simulate Web Workers, so that part is done for us.
The Caja project has a wiki page on how to use Caja as a standalone client-side sanitizer:
ant
html-sanitizer-minified.js
or html-css-sanitizer-minified.js
in your pagehtml_sanitize(...)
The worker script only needs to follow those instructions:
importScripts('html-css-sanitizer-minified.js'); // or 'html-sanitizer-minified.js' var urlTransformer, nameIdClassTransformer; // customize if you need to filter URLs and/or ids/names/classes urlTransformer = nameIdClassTransformer = function(s) { return s; }; // when we receive some HTML self.onmessage = function(event) { // sanitize, then send the result back postMessage(html_sanitize(event.data, urlTransformer, nameIdClassTransformer)); };
(A bit more code is needed to get the simworker library working, but it's not important to this discussion.)
Demo: https://dl.dropbox.com/u/291406/html-sanitize/demo.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With