<p>I'm trying to code a secure and lightweight white-list based HTML purifier which will use DOMDocument. In order to avoid unnecessary complexity I am willing to make the following compromises:</p> <ul> <li>HTML comments are removed</li> <li> <code>script</code> and <code>style</code> tags are stripped all together</li> <li>only the child nodes of the <code>body</code> tag will be returned</li> <li>all HTML attributes that can trigger Javascript events will either be validated or removed</li> </ul> <p>I've been reading a lot about on XSS attacks and prevention and I hope I'm not being too naive (if I am, please let me know!) in assuming that if I follow all the rules I mentioned above, I will be safe from XSS.</p> <p>The problem is I am not sure what other tags and attributes (in any [X]HTML version and/or browser versions/implementations) can trigger Javascript events, besides the default Javascript event attributes:</p> <ul> <li><code>onAbort</code></li> <li><code>onBlur</code></li> <li><code>onChange</code></li> <li><code>onClick</code></li> <li><code>onDblClick</code></li> <li><code>onDragDrop</code></li> <li><code>onError</code></li> <li><code>onFocus</code></li> <li><code>onKeyDown</code></li> <li><code>onKeyPress</code></li> <li><code>onKeyUp</code></li> <li><code>onLoad</code></li> <li><code>onMouseDown</code></li> <li><code>onMouseMove</code></li> <li><code>onMouseOut</code></li> <li><code>onMouseOver</code></li> <li><code>onMouseUp</code></li> <li><code>onMove</code></li> <li><code>onReset</code></li> <li><code>onResize</code></li> <li><code>onSelect</code></li> <li><code>onSubmit</code></li> <li><code>onUnload</code></li> </ul> <p>Are there any other non-default or proprietary event attributes that can trigger Javascript (or VBScript, etc...) events or code execution? I can think of <code>href</code>, <code>style</code> and <code>action</code>, for instance:</p> <pre class="prettyprint"><code><a href="javascript:alert(document.location);">XSS</a> // or <b style="width: expression(alert(document.location));">XSS</b> // or <form action="javascript:alert(document.location);"><input type="submit" /></form> </code></pre> <p>I will probably just remove any <code>style</code> attributes in the HTML tags, the <code>action</code> and <code>href</code> attributes pose a bigger challenge but I think the following code is enough to make sure their value is either a relative or absolute URL and not some nasty Javascript code:</p> <pre class="prettyprint"><code>$value = $attribute->value; if ((strpos($value, ':') !== false) && (preg_match('~^(?:(?:s?f|ht)tps?|mailto):~i', $value) == 0)) { $node->removeAttributeNode($attribute); } </code></pre> <p>So, my two obvious questions are:</p> <ol> <li><strong>Am I missing any tags or attributes that can trigger events?</strong></li> <li><strong>Is there any attack vector that is not covered by these rules?</strong></li> </ol> <hr> <p>After a lot of testing, pondering and researching I've come up with the following (rather simple) implementation which, appears to be immune to any XSS attack vector I could throw at it.</p> <p>I highly appreciate all your valuable answers, thanks.</p>

<p>You mention <code>href</code> and <code>action</code> as places <code>javascript:</code> URLs can appear, but you're missing the <code>src</code> attribute among a bunch of other URL loading attributes.</p> <p>Line 399 of the OWASP Java HTMLPolicyBuilder is the definition of URL attributes in a white-listing HTML sanitizer.</p> <blockquote> <pre class="prettyprint"><code>private static final Set<String> URL_ATTRIBUTE_NAMES = ImmutableSet.of( "action", "archive", "background", "cite", "classid", "codebase", "data", "dsync", "formaction", "href", "icon", "longdesc", "manifest", "poster", "profile", "src", "usemap"); </code></pre> </blockquote> <p>The HTML5 Index contains a summary of attribute types. It doesn't mention some conditional things like <code><input type=URL value=...></code> but if you scan that list for valid URL and friends, you should get a decent idea of what HTML5 adds. The set of HTML 4 attributes with type <code>%URI</code> is also informative.</p> <p>Your protocol whitelist looks very similar to the OWASP sanitizer one. The addition of <code>ftp</code> and <code>sftp</code> looks innocuous enough.</p> <p>A good source of security related schema info for HTML element and attributes is the Caja JSON whitelists which are used by the Caja JS HTML sanitizer.</p> <p>How are you planning on rendering the resulting DOM? If you're not careful, then even if you strip out all the <code><script></code> elements, an attacker might get a buggy renderer to produce content that a browser interprets as containing a <code><script></code> element. Consider the valid HTML that does not contain a script element.</p> <pre class="prettyprint"><code><textarea><&#47;textarea><script>alert(1337)</script></textarea> </code></pre> <p>A buggy renderer might output the contents of this as:</p> <pre class="prettyprint"><code><textarea></textarea><script>alert(1337)</script></textarea> </code></pre> <p>which does contain a script element.</p> <p>(Full disclosure: I wrote chunks of both HTML sanitizers mentioned above.)</p>

XSS - Which HTML Tags and Attributes can trigger Javascript Events?

Tags:

I'm trying to code a secure and lightweight white-list based HTML purifier which will use DOMDocument. In order to avoid unnecessary complexity I am willing to make the following compromises:

HTML comments are removed
script and style tags are stripped all together
only the child nodes of the body tag will be returned
all HTML attributes that can trigger Javascript events will either be validated or removed

I've been reading a lot about on XSS attacks and prevention and I hope I'm not being too naive (if I am, please let me know!) in assuming that if I follow all the rules I mentioned above, I will be safe from XSS.

The problem is I am not sure what other tags and attributes (in any [X]HTML version and/or browser versions/implementations) can trigger Javascript events, besides the default Javascript event attributes:

onAbort
onBlur
onChange
onClick
onDblClick
onDragDrop
onError
onFocus
onKeyDown
onKeyPress
onKeyUp
onLoad
onMouseDown
onMouseMove
onMouseOut
onMouseOver
onMouseUp
onMove
onReset
onResize
onSelect
onSubmit
onUnload

Are there any other non-default or proprietary event attributes that can trigger Javascript (or VBScript, etc...) events or code execution? I can think of href, style and action, for instance:

<a href="javascript:alert(document.location);">XSS</a> // or
<b style="width: expression(alert(document.location));">XSS</b> // or
<form action="javascript:alert(document.location);"><input type="submit" /></form>

I will probably just remove any style attributes in the HTML tags, the action and href attributes pose a bigger challenge but I think the following code is enough to make sure their value is either a relative or absolute URL and not some nasty Javascript code:

$value = $attribute->value;

if ((strpos($value, ':') !== false) && (preg_match('~^(?:(?:s?f|ht)tps?|mailto):~i', $value) == 0))
{
    $node->removeAttributeNode($attribute);
}

So, my two obvious questions are:

Am I missing any tags or attributes that can trigger events?
Is there any attack vector that is not covered by these rules?

After a lot of testing, pondering and researching I've come up with the following (rather simple) implementation which, appears to be immune to any XSS attack vector I could throw at it.

I highly appreciate all your valuable answers, thanks.

700

asked Aug 07 '11 21:08

Alix Axel

1 Answers

You mention href and action as places javascript: URLs can appear, but you're missing the src attribute among a bunch of other URL loading attributes.

Line 399 of the OWASP Java HTMLPolicyBuilder is the definition of URL attributes in a white-listing HTML sanitizer.

private static final Set<String> URL_ATTRIBUTE_NAMES = ImmutableSet.of(
  "action", "archive", "background", "cite", "classid", "codebase", "data",
  "dsync", "formaction", "href", "icon", "longdesc", "manifest", "poster",
  "profile", "src", "usemap");

The HTML5 Index contains a summary of attribute types. It doesn't mention some conditional things like <input type=URL value=...> but if you scan that list for valid URL and friends, you should get a decent idea of what HTML5 adds. The set of HTML 4 attributes with type %URI is also informative.

Your protocol whitelist looks very similar to the OWASP sanitizer one. The addition of ftp and sftp looks innocuous enough.

A good source of security related schema info for HTML element and attributes is the Caja JSON whitelists which are used by the Caja JS HTML sanitizer.

How are you planning on rendering the resulting DOM? If you're not careful, then even if you strip out all the <script> elements, an attacker might get a buggy renderer to produce content that a browser interprets as containing a <script> element. Consider the valid HTML that does not contain a script element.

<textarea><&#47;textarea><script>alert(1337)</script></textarea>

A buggy renderer might output the contents of this as:

<textarea></textarea><script>alert(1337)</script></textarea>

which does contain a script element.

(Full disclosure: I wrote chunks of both HTML sanitizers mentioned above.)

103

answered Nov 15 '22 12:11

Mike Samuel

Related questions
                            
                                TransactionScope TransactionAborted Exception - transaction not rolled back. Should it be?
                            
                                Proper HTTP headers for login success / fail responses?
                            
                                Python using exceptions for control flow considered bad?
                            
                                What is the difference between an object and a prototype in prototypal programming?
                            
                                Run a PHP script directly in PhpStorm
                            
                                How to disable #line directives being written to the T4 generation output file
                            
                                Eclipse 3.7.0 Indigo with CDT shows many false compilation errors
                            
                                Get direction (compass) with two longitude/latitude points
                            
                                Where are the Java 7 updates for OpenJDK?
                            
                                foreach with variable name equal to field name
                            
                                Using Jasmine to spy on variables in a function
                            
                                Is there a lot of Plan 9 development?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With