Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

construct a DOM tree from a string without loading resources (specifically images)

So I am grabbing RSS feeds via AJAX. After processing them, I have a html string that I want to manipulate using various jQuery functionality. In order to do this, I need a tree of DOM nodes.

I can parse a HTML string into the jQuery() function.
I can add it as innerHTML to some hidden node and use that.
I have even tried using mozilla's nonstandard range.createContextualFragment().

The problem with all of these solutions is that when my HTML snippet has an <img> tag, firefox dutifully fetches whatever image is referenced. Since this processing is background stuff that isn't being displayed to the user, I'd like to just get a DOM tree without the browser loading all the images contained in it.

Is this possible with javascript? I don't mind if it's mozilla-only, as I'm already using javascript 1.7 features (which seem to be mozilla-only for now)

like image 374
gfxmonk Avatar asked Feb 20 '10 12:02

gfxmonk


People also ask

How is tree structure formed in DOM?

An example of the DOMTags are element nodes (or just elements) and form the tree structure: <html> is at the root, then <head> and <body> are its children, etc. The text inside elements forms text nodes, labelled as #text . A text node contains only a string. It may not have children and is always a leaf of the tree.

How do I create a DOM structure in my browser?

The Text pane shows the HTML source code of the page that is currently opened in the browser. As soon as any change is made to the page in the browser (e.g. clicking an icon), the code in the pane is updated accordingly. The Structure pane shows the DOM structure of the HTML code in the Text pane.

What is meant by DOM tree?

The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an XML or HTML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree.

What will happen to DOM tree if some issue happens in script tag?

The browser will start to execute those CPU-intensive JS right after it reaches that <script> tag. And it will block parsing the rest of the HTML content.


2 Answers

The answer is this:

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(htmlString, "text/html");
var jdoc = $(htmlDoc);
console.log(jdoc.find('img'));

If you pay attention to your web requests you'll notice that none are made even though the html string is parsed and wrapped by jquery.

like image 152
argyle Avatar answered Oct 14 '22 11:10

argyle


The obvious answer is to parse the string and remove the src attributes from img tags (and similar for other external resources you don't want to load). But you'll have already thought of that and I'm sure you're looking for something less troublesome. I'm also assuming you've already tried removing the src attribute after having jquery parse the string but before appending it to the document, and found that the images are still being requested.

I'm not coming up with anything else, but you may not need to do full parsing; this replacement should do it in Firefox with some caveats:

thestring = thestring.replace("<img ", "<img src='' ");

The caveats:

  • This appears to work in the current Firefox. That doesn't meant that subsequent versions won't choose to handle duplicated src attributes differently.
  • This assumes the literal string "general purpose assumption, that string could appear in an attribute value on a sufficiently...interesting...page, especially in an inline onclick handler like this: <a href='#' onclick='$("frog").html("<img src=\"spinner.gif\">")'> (Although in that example, the false positive replacement is harmless.)

This is obviously a hack, but in a limited environment with reasonably well-known data...

like image 39
T.J. Crowder Avatar answered Oct 14 '22 12:10

T.J. Crowder