Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use document.implementation.createHTMLDocument?

Tags:

javascript

What are some use cases and is it deprecated? As I found out at http://groups.google.com/group/envjs/browse_thread/thread/6c22d0f959666009/c389fc11537f2a97 that it's "non-standard and not supported by any modern browser".

About document.implementation at http://javascript.gakaa.com/document-implementation.aspx:

Returns a reference to the W3C DOMImplementation object, which represents, to a limited degree, the environment that makes up the document containerthe browser, for our purposes. Methods of the object let you see which DOM modules the browser reports supporting. This object is also a gateway to creating virtual W3C Document and DocumentType objects outside of the current document tree. Thus, in Netscape 6 you can use the document.implementation property as a start to generating a nonrendered document for external XML documents. See the DOMImplementation object for details about the methods and their browser support.

Given that it provides methods (such as createHTMLDocument) for creating a non-rendered document outside of the current document tree, would it be safe to feed it untrusted third party HTML input that may contain some XSS? I ask because I would like to use createHTMLDocument for traversal purposes of third party HTML input. May that be one of the use cases?

like image 778
Polar Avatar asked Oct 12 '11 09:10

Polar


People also ask

What is DOM implementation?

The DOMImplementation interface represents an object providing methods which are not dependent on any particular document. Such an object is returned by the Document. implementation property.

Which method is used to create a new document in Javascript?

open() , document. write() and document.

Is HTML a dom?

The Document Object Model (DOM) is a programming API for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated.


3 Answers

I always use this because it doesn't make requests to images, execute scripts or affect styling:

function cleanHTML( html ) {
    var root = document.implementation.createHTMLDocument().body;

    root.innerHTML = html;

    //Manipulate the DOM here
    $(root).find("script, style, img").remove(); //jQuery is not relevant, I just didn't want to write exhausting boilerplate code just to make a point

    return root.innerHTML;
}


cleanHTML( '<div>hello</div><img src="google"><script>alert("hello");</script><style type="text/css">body {display: none !important;}</style>' );
//returns "<div>hello</div>" with the page unaffected
like image 101
Esailija Avatar answered Sep 29 '22 18:09

Esailija


Yes. You can use this to load untrusted third-party content and strip it of dangerous tags and attributes before including it into your own document. There is some great research incorporating this trick, described at http://blog.kotowicz.net/2011/10/sad-state-of-dom-security-or-how-we-all.html.

The technique documented by Esailija above is insufficient, however. You also need to strip out most attributes. An attacker could set an onerror or onmouseover element to malicious JS. The style attribute can be used to include CSS that runs malicious JS. Iframe and other embed tags can also be abused. View source at https://html5sec.org/xssme/xssme2 to see a version of this technique.

like image 21
jsha Avatar answered Sep 29 '22 18:09

jsha


Just a cleaner answer besides @Esailija and @Greg answers: This function will create another document outside the tree of current document, and clean all scripts, styles and images from the new document:

function insertDocument (myHTML) {
    var newHTMLDocument = document.implementation.createHTMLDocument().body;
    newHTMLDocument.innerHTML = myHTML;
    [].forEach.call(newHTMLDocument.querySelectorAll("script, style, img"), function(el) {el.remove(); });
    documentsList.push(newHTMLDocument);
    return $(newHTMLDocument.innerHTML);
}

This one is fantastic for making ajax requests and scraping the content will be faster :)

like image 35
Mohammed AlBanna Avatar answered Sep 29 '22 18:09

Mohammed AlBanna