Everyone knows what DOM selectors like document.getElementByID(...)
and document.querySelector(...)
do and how you can use it with classes, attributes, id and so on.
But I was not able to find how does it work under the hood (I can find perf test comparisons but I am interested in theory). I know that the html page is load, parsed by the browser and the DOM tree is constructed. But how does each of the selectors traverses the DOM tree to find the elements.
I have took a look at a spec for parsing algorithm and read really nice explanation how Browsers work, but also it gives excellent explanation about HTML, CSS parsing and rendering flow it does not give explanation how each of these selectors traverses this tree to find the elements.
I assume that in order to find something like .black
or span
it needs to traverse the whole tree, but to find #id
it may be traversing some additional data structure and thus making it much faster. Please do not write your assumptions, I am looking for concrete knowledge with backup to specification or implementation in some browsers.
Inspecting Firefox's source and reading the related documentation will help get some initial insight.
Once the document is fetched, it's passed to the parser (see: /mozilla/parser/html/) which will chew through the document and generate a content tree. The central parts of the parser are written in Java (/mozilla/parser/html/javasrc/) and then translated to C++ for building, so be ready to have a good time when you want to read the rest of the source.
Looking at the parser's source (/mozilla/parser/html/javasrc/TreeBuilder.java), namely an excerpt from the function startTag
:
1579 if (errorHandler != null) {
1580 // ID uniqueness
1581 @IdType String id = attributes.getId();
1582 if (id != null) {
1583 LocatorImpl oldLoc = idLocations.get(id);
1584 if (oldLoc != null) {
1585 err("Duplicate ID \u201C" + id + "\u201D.");
1586 errorHandler.warning(new SAXParseException(
1587 "The first occurrence of ID \u201C" + id
1588 + "\u201D was here.", oldLoc));
1589 } else {
1590 idLocations.put(id, new LocatorImpl(tokenizer));
1591 }
1592 }
1593 }
Turning attention to line 1590 and keeping in mind that earlier in the same file we have:
459 private final Map<String, LocatorImpl> idLocations = new HashMap<String, LocatorImpl>();
We can see that node ids are kept in a simple hash map. Looking up how classes are processed is an exercise left to the reader.
Different DOM methods, for example document.getElementByID(...)
, are connected to this hash map through glue code and a plethora of object hierarchy, see "How is the web-exposed DOM implemented?" on ask.mozilla.org.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With