Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitizing untrusted HTML5

I want to be able to accept HTML from untrusted users and sanitize it so that I can safely include it in pages on my website. By this I mean that markup should not be stripped or escaped, but should be passed through essentially unchanged unless it contains dangerous tags such as <script> or <iframe>, dangerous attributes such as onload, or dangerous CSS properties such as background URLs. (Apparently some older IEs will execute javascript URLs in CSS?)

Serving the content from a different domain, enclosed in an iframe, is not a good option because there is no way to tell in advance how tall the iframe has to be so it will always look ugly for some pages.

I looked into HTML Purifier, but it looks like it doesn't support HTML5 yet. I also looked into Google Caja, but I'm looking for a solution that doesn't use scripts.

Does anyone know of a library that will accomplish this? PHP is preferred, but beggars can't be choosers.

like image 341
Brian Bi Avatar asked Jul 17 '13 05:07

Brian Bi


People also ask

How do I disinfect HTML content?

Sanitize a string immediatelysetHTML() is used to sanitize a string of HTML and insert it into the Element with an id of target . The script element is disallowed by the default sanitizer so the alert is removed.

When should you sanitize HTML?

HTML sanitization can be used to protect against attacks such as cross-site scripting (XSS) by sanitizing any HTML code submitted by a user.

Why do you need to sanitize HTML?

HTML sanitization is an OWASP-recommended strategy to prevent XSS vulnerabilities in web applications. HTML sanitization offers a security mechanism to remove unsafe (and potentially malicious) content from untrusted raw HTML strings before presenting them to the user.

What is XSS Sanitizer?

This plugin uses OWASP ESAPI library to sanitize request parameters. This reduces the risk of dangerous XSS request parameters possibly being rendered on the client. Owner: rpalcolea | 1.0.0 | Aug 22, 2016 | Package | Issues | Source | License: Apache-2.0.


1 Answers

The black listing approach puts you under upgrade pressure. So each time browsers start to support new standards you MUST draw your sanitizing tool to the same level. Such changes happen more often than you think.

White listing (which is achieved by strip_tags with well defined exceptions) of cause shrinks options for your users, but puts you on the save site.

On my own sites I have the policy to apply the black listing on pages for very trusted users (such as admins) and the whitelisting on all other pages. That sets me into the position to not put much effort into the black listing. With more mature role & permission concepts you can even fine grain your black lists and white lists.


UPDATE: I guess you look for this:

  • Allow user submitted HTML in PHP
  • with HTMLpurifier, how to add a couple attributes to the default whitelist, e.g. 'onclick'

I got the point that strip_tags whitelists on tag level but does accept everything on attribute level. Interestingly HTMLpurifier seems to do the whitelisting on attribute level. Thanks, was a nice learning here.

like image 103
Quicker Avatar answered Sep 18 '22 03:09

Quicker