What is the best way to parse (get a DOM tree of) a HTML result of XmlHttpRequest in Firefox?
EDIT:
I do not have the DOM tree, I want to acquire it.
XmlHttpRequest's "responseXML" works only when the result is actual XML, so I have only responseText to work with.
The innerHTML hack doesn't seem to work with a complete HTML document (in <html></html>). - turns out it works fine.
HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.
The DOMParser interface provides the ability to parse XML or HTML source code from a string into a DOM Document . You can perform the opposite operation—converting a DOM tree into XML or HTML source—using the XMLSerializer interface.
HTML is a markup language with a simple structure. It would be quite easy to build a parser for HTML with a parser generator. Actually, you may not need even to do that, if you choose a popular parser generator, like ANTLR. That is because there are already available grammars ready to be used.
innerHTML
should work just fine, e.g.
// This would be after the Ajax request:
var myHTML = XHR.responseText;
var tempDiv = document.createElement('div');
tempDiv.innerHTML = myHTML.replace(/<script(.|\s)*?\/script>/g, '');
// tempDiv now has a DOM structure:
tempDiv.childNodes;
tempDiv.getElementsByTagName('a'); // etc. etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With