What are some use cases and is it deprecated? As I found out at http://groups.google.com/group/envjs/browse_thread/thread/6c22d0f959666009/c389fc11537f2a97 that it's "non-standard and not supported by any modern browser".
About document.implementation
at http://javascript.gakaa.com/document-implementation.aspx:
Returns a reference to the W3C DOMImplementation object, which represents, to a limited degree, the environment that makes up the document containerthe browser, for our purposes. Methods of the object let you see which DOM modules the browser reports supporting. This object is also a gateway to creating virtual W3C Document and DocumentType objects outside of the current document tree. Thus, in Netscape 6 you can use the document.implementation property as a start to generating a nonrendered document for external XML documents. See the DOMImplementation object for details about the methods and their browser support.
Given that it provides methods (such as createHTMLDocument
) for creating a non-rendered document outside of the current document tree, would it be safe to feed it untrusted third party HTML input that may contain some XSS? I ask because I would like to use createHTMLDocument
for traversal purposes of third party HTML input. May that be one of the use cases?
The DOMImplementation interface represents an object providing methods which are not dependent on any particular document. Such an object is returned by the Document. implementation property.
open() , document. write() and document.
The Document Object Model (DOM) is a programming API for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated.
I always use this because it doesn't make requests to images, execute scripts or affect styling:
function cleanHTML( html ) {
var root = document.implementation.createHTMLDocument().body;
root.innerHTML = html;
//Manipulate the DOM here
$(root).find("script, style, img").remove(); //jQuery is not relevant, I just didn't want to write exhausting boilerplate code just to make a point
return root.innerHTML;
}
cleanHTML( '<div>hello</div><img src="google"><script>alert("hello");</script><style type="text/css">body {display: none !important;}</style>' );
//returns "<div>hello</div>" with the page unaffected
Yes. You can use this to load untrusted third-party content and strip it of dangerous tags and attributes before including it into your own document. There is some great research incorporating this trick, described at http://blog.kotowicz.net/2011/10/sad-state-of-dom-security-or-how-we-all.html.
The technique documented by Esailija above is insufficient, however. You also need to strip out most attributes. An attacker could set an onerror or onmouseover element to malicious JS. The style attribute can be used to include CSS that runs malicious JS. Iframe and other embed tags can also be abused. View source at https://html5sec.org/xssme/xssme2 to see a version of this technique.
Just a cleaner answer besides @Esailija and @Greg answers: This function will create another document outside the tree of current document, and clean all scripts, styles and images from the new document:
function insertDocument (myHTML) {
var newHTMLDocument = document.implementation.createHTMLDocument().body;
newHTMLDocument.innerHTML = myHTML;
[].forEach.call(newHTMLDocument.querySelectorAll("script, style, img"), function(el) {el.remove(); });
documentsList.push(newHTMLDocument);
return $(newHTMLDocument.innerHTML);
}
This one is fantastic for making ajax requests and scraping the content will be faster :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With