Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use NodeIterator

Benchmark compares QSA & .forEach vs a NodeIterator

toArray(document.querySelectorAll("div > a.klass")).forEach(function (node) {
  // do something with node
});

var filter = {
    acceptNode: function (node) {
        var condition = node.parentNode.tagName === "DIV" &&
            node.classList.contains("klass") &&
            node.tagName === "A";

        return condition ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
    }  
}
// FIREFOX Y U SUCK
var iter = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, filter, false);
var node;
while (node = iter.nextNode()) {
    // do thing with node    
}

Now either NodeIterator's suck or I'm doing it wrong.

Question: When should I use a NodeIterator ?

In case you don't know, DOM4 specifies what NodeIterator is.

like image 753
Raynos Avatar asked Oct 29 '11 19:10

Raynos


2 Answers

NodeIterator (and TreeWalker, for that matter) are almost never used, because of a variety of reasons. This means that information on the topic is scarce and answers like @gsnedders' come to be, which completely miss the mark. I know this question is almost a decade old, so excuse my necromancy.

  1. Initiation & Performance = It is true that the initiation of a NodeIterator is waaay slower than a method like querySelectorAll, but that is not the performance you should be measuring.

The thing about NodeIterators is that they are live-ish in the way that, just like an HTMLCollection or live NodeList, you can keep using the object after initiating it once.
The NodeList returned by querySelectorAll is static and will have to be re-initiated every time you need to match newly added elements.

This version of the jsPerf puts the NodeIterator in the preparation code. The actual test only tries to loop over all newly added elements with iter.nextNode(). You can see that the iterator is now orders of magnitudes faster.

  1. Selector performance = Okay, cool. Caching the iterator is faster. This version, however, shows another significant difference. I've added 10 classes (done[0-9]) that the selectors shouldn't be matching. The iterator loses about 10% of its speed, while the querySelectors lose 20%.

On the other hand, this version, shows what happens when you add another div > at the start of the selector. The iterator loses 33% of its speed, while the querySelectors got a speed INCREASE of 10%.

Removing the initial div > at the start of the selector like in this version shows that both methods become slower, because they match more than earlier versions. Like expected, the iterator is relatively more performant than the querySelectors in this case.

This means that filtering on basis of a node's own properties (its classes, attributes, etc.) is probably faster in a NodeIterator, while having a lot of combinators (>, +, ~, etc.) in your selector probably means querySelectorAll is faster.
This is especially true for the  (space) combinator. Selecting elements with querySelectorAll('article a') is way easier than manually looping over all parents of every a element, looking for one that has a tagName of 'ARTICLE'.

P.S. in §3.2, I give an example of how the exact opposite can be true if you want the opposite of what the space combinator does (exclude a tags with an article ancestor).

3 Impossible selectors

3.1 Simple hierarchical relationships

Of course, manually filtering elements gives you practically unlimited control. This means that you can filter out elements that would normally be impossible to match with CSS selectors. For example, CSS selectors can only "look back" in the way that selecting divs that are preceded by another div is possible with div + div. Selecting divs that are followed by another div is impossible.

However, inside a NodeFilter, you can achieve this by checking node.nextElementSibling.tagName === 'DIV'. The same goes for every selection CSS selectors can't make.

3.2 More global hierarchical relationships

Another thing I personally love about the usage of NodeFilters, is that when passed to a TreeWalker, you can reject a node and its whole sub-tree by returning NodeFilter.FILTER_REJECT instead of NodeFilter.FILTER_SKIP.

Imagine you want to iterate over all a tags on the page, except for ones with an article ancestor.
With querySelectors, you'd type something like

let a = document.querySelectorAll('a')
a = Array.prototype.filter.call(a, function (node) {
  while (node = node.parentElement) if (node.tagName === 'ARTICLE') return false
  return true
})

While in a NodeFilter, you'd only have to type this

return node.tagName === 'ARTICLE' ? NodeFilter.FILTER_REJECT : // ✨ Magic happens here ✨
       node.tagName === 'A'       ? NodeFilter.FILTER_ACCEPT :
                                    NodeFilter.FILTER_SKIP

In conclusion

You don't initiate the API every time you need to iterate over nodes of the same kind. Sadly, that assumption was made with the question being asked, and the +500 answer (giving it a lot more credit) doesn't even address the error or any of the perks NodeIterators have.

There's two main advantages NodeIterators have to offer:

  • Live-ishness, as discussed in §1
  • Advanced filtering, as discussed in §3
    (I can't stress enough how useful the NodeFilter.FILTER_REJECT example is)

However, don't use NodeIterators when any of the following is true:

  • Its instance is only going to be used once/a few times
  • Complex hierarchical relationships are queried that are possible with CSS selectors
    (i.e. body.no-js article > div > div a[href^="/"])

Sorry for the long answer :)
like image 68
Gust van de Wal Avatar answered Oct 18 '22 03:10

Gust van de Wal


It's slow for a variety of reasons. Most obviously is the fact that nobody uses it so quite simply far less time has been spent optimizing it. The other problem is it's massively re-entrant, every node having to call into JS and run the filter function.

If you look at revision three of the benchmark, you'll find I've added a reimplementation of what the iterator is doing using getElementsByTagName("*") and then running an identical filter on that. As the results show, it's massively quicker. Going JS -> C++ -> JS is slow.

Filtering the nodes entirely in JS (the getElementsByTagName case) or C++ (the querySelectorAll case) is far quicker than doing it by repeatedly crossing the boundary.

Note also selector matching, as used by querySelectorAll, is comparatively smart: it does right-to-left matching and is based on pre-computed caches (most browsers will iterate over a cached list of all elements with the class "klass", check if it's an a element, and then check if the parent is a div) and hence they won't even bother with iterating over the entire document.

Given that, when to use NodeIterator? Basically never in JavaScript, at least. In languages such as Java (undoubtedly the primary reason why there's an interface called NodeIterator), it will likely be just as quick as anything else, as then your filter will be in the same language as the filter. Apart from that, the only other time it makes sense is in languages where the memory usage of creating a Node object is far greater than the internal representation of the Node.

like image 32
gsnedders Avatar answered Oct 18 '22 05:10

gsnedders