Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A JavaScript parser for DOM

We have a special requirement in a project where we have to parse a string of HTML (from an AJAX response) client side via JavaScript only. Thats right no parsing in PHP or Java! I've been going through StackOverflow, this entire week and have yet not got an acceptable solution.

Some more details on the requirements:

  • We can use any library (preferably dojo and / or jQuery) or go native!

  • We need to parse an Entire HTML Document that we receive as a string, including the <head> and <body>.

  • We also need to serialise out the parsed DOM structures to strings at times.

  • Finally, We don't want to append the parsed DOM to the current Document. Rather, we'll send it back to the server for permanent storage.

Eg: We need something like

var dom = HTMLtoDOM('<html><head><title> This is the old title. </title></head></html>');
    dom.getElementsByTagName('title')[0].innerHTML = "This is a new Title";

With my research, these are our options:

  1. A TinyMCE Parser. Problem? We need to necessarily include an editor I think. How about for parsing HTML where we don't need an editor?

  2. John Resig's Parser. Should be our best bet. Unfortunately, the parser is crashing when the entire contents of a page is given to it!

  3. The jQuery $(htmlString) or the dojo.toDom(htmlString). Both rely on DocumentFragment and hence gobble up <head> and <body>!

EDIT: We want to serialize the HTML so we may catch certain custom HTML Commnets via RegExp. We need to give users the opportunity to edit meta tags, title tags etc hence the HTML Parser.

Oh and I feel I will be murdered in Stack Overflow even if I just hint at parsing HTML via RegExp!!!

like image 366
Gaurav Ramanan Avatar asked Mar 02 '12 20:03

Gaurav Ramanan


People also ask

What is DOM parser in JavaScript?

The DOMParser interface provides the ability to parse XML or HTML source code from a string into a DOM Document . You can perform the opposite operation—converting a DOM tree into XML or HTML source—using the XMLSerializer interface.

Which type of parser is DOM?

The Document Object Model (DOM) is an official recommendation of the World Wide Web Consortium (W3C). It defines an interface that enables programs to access and update the style, structure, and contents of XML documents. XML parsers that support DOM implement this interface.

Is DOM tree based parser?

Features of DOM Parser A DOM Parser creates an internal structure in memory which is a DOM document object and the client applications get information of the original XML document by invoking methods on this document object. DOM Parser has a tree based structure.


1 Answers

You can leverage the current document without appending any nodes to it.

Try something like this:

function toNode(html) {
    var doc = document.createElement('html');
    doc.innerHTML = html;
    return doc;
}

var node = toNode('<html><head><title> This is the old title. </title></head></html>');

console.log(node);​

http://jsfiddle.net/6SvqA/3/

like image 57
Dagg Nabbit Avatar answered Oct 14 '22 08:10

Dagg Nabbit