I want to allow embedding of HTML but avoid DoS due to deeply nested HTML documents that crash some browsers. I'd like to be able to accommodate 99.9% of documents, but reject those that nest too deeply.
Two closely related question:
Document depth is defined as 1 + the maximum number of parent traversals needed to reach the document root from any node in a document. For example, in
<html> <!-- 1 --> <body> <!-- 2 --> <div> <!-- 3 --> <table> <!-- 4 --> <tbody> <!-- 5 --> <tr> <!-- 6 --> <td> <!-- 7 --> Foo <!-- 8 -->
the maximum depth is 8 since the text node "Foo" has 8 ancestors. Ancestor here is interpreted non-strictly, i.e. ever node is its own ancestor and its own descendent.
Opera has some table nesting stats, which suggest that 99.99% of documents have a table nesting depth of less than 22, but that data does not contain whole document depth.
EDIT:
If people would like to criticize the HTML sanitization library instead of answering this question, please do. http://code.google.com/p/owasp-java-html-sanitizer/wiki/AttackReviewGroundRules explains how to find the code, where to find a testbed that lets you try out attacks, and how to report issues.
EDIT:
I asked Adam Barth, and he very kindly pointed me to webkit code that handles this.
Webkit, at least, enforces this limit. When a treebuilder is created it receives a tree limit that is configurable:
m_treeBuilder(HTMLTreeBuilder::create(this, document, reportErrors, usePreHTML5ParserQuirks(document), maximumDOMTreeDepth**(document)))
and it is tested by the block-nesting-cap test.
There are three categories of HTML: transitional, strict, and frameset. Transitional is the most common type of HTML while the strict type of HTML is meant to return rules to HTML and make it more reliable. Frameset allows Web developers to create a mosaic of HTML documents and a menu system.
An HTML 4.0 document generally consists of three parts: a line containing version information, a descriptive header section, and a body, which contains the document's actual content.
It may be worth asking [email protected]. Their study from 2005 (http://code.google.com/webstats/) doesn't cover your particular question. They sampled more than a billion documents though, and are interested in hearing about anything you feel is worth examining.
--[Update]--
Here's a crude script I wrote to test the browsers I have (putting the number of elements to nest into the query string):
var n = Number(window.location.search.substring(1)); var outboundHtml = ''; var inboundHtml = ''; for(var i = 0; i < n; i++) { outboundHtml += '<div>' + (i + 1); inboundHtml += '</div>'; } var testWindow = window.open(); testWindow.document.open(); testWindow.document.write(outboundHtml + inboundHtml); testWindow.document.close();
And here are my findings (may be specific to my machine, Win XP, 3Gb Ram):
More on Chrome:
Changing the DIV to a SPAN resulted in Chrome being able to nest 9202 elements before crashing. So it's not the size of the HTML that is the reason (although SPAN elements may be more lightweight).
Nesting 2077 table cells (<table><tr><td>
) worked (6231 elements), until you scrolled down to cell 445, then it crashed, so you can't nest 445 Table Cells (1335 elements).
Testing with files generated from the script (as opposed to writing to new windows) give slightly higher tolerances, but Chrome still crashed.
You can nest 1409 list items (<ul><li>
) before it crashes, which is interesting because:
Setting a DOCTYPE is effective in IE8 (putting it into standards mode, i.e. var outboundHtml = '<!DOCTYPE html>';
): It will not nest 792 list items (the tab crashes/closes) or 1593 DIVs. It made no difference in IE8 whether the test was generated from the script or loaded from a file.
So the nesting limit of a browser apparently depends on the type of HTML elements the attacker is injecting, and the layout engine. There could be some HTML considerably smaller than this. And we have a plain-HTML DoS for IE8, Chrome and Safari users with a considerably small payload.
It seems if you are going to allow users to post HTML that gets rendered on one of your pages, it is worth considering a limit on nested elements if there is a generous size limit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With