I'm attempting map HTML into JSON with structure intact. Are there any libraries out there that do this or will I need to write my own? I suppose if there are no html2json libraries out there I could take an xml2json library as a start. After all, html is only a variant of xml anyway right?
UPDATE: Okay, I should probably give an example. What I'm trying to do is the following. Parse a string of html:
<div> <span>text</span>Text2 </div>
into a json object like so:
{ "type" : "div", "content" : [ { "type" : "span", "content" : [ "Text2" ] }, "Text2" ] }
NOTE: In case you didn't notice the tag, I'm looking for a solution in Javascript
HTML to JSON Converter is used to convert HTML document to JSON by extracting the rows from HTML tables & converting it to JSON format. HTML is parsed, data types are automatically detected & converted to appropriate format in the JSON output. And finally the JSON output is formatted & indented for easy viewing.
You can map the data types of your business model into JSON by using the examples. Data in JSON is either an object or an array. A JSON object is an unordered collection of names and values.
Unix/Linux tools come natively with a host of shell utilities that one can use for parsing out the desired name/value pairs. Tools include sed, awk, cut, tr, and grep, to name a few. System administrators use these utilities frequently and may be able to assist with the methods for parsing JSON strings.
I just wrote this function that does what you want; try it out let me know if it doesn't work correctly for you:
// Test with an element. var initElement = document.getElementsByTagName("html")[0]; var json = mapDOM(initElement, true); console.log(json); // Test with a string. initElement = "<div><span>text</span>Text2</div>"; json = mapDOM(initElement, true); console.log(json); function mapDOM(element, json) { var treeObject = {}; // If string convert to document Node if (typeof element === "string") { if (window.DOMParser) { parser = new DOMParser(); docNode = parser.parseFromString(element,"text/xml"); } else { // Microsoft strikes again docNode = new ActiveXObject("Microsoft.XMLDOM"); docNode.async = false; docNode.loadXML(element); } element = docNode.firstChild; } //Recursively loop through DOM elements and assign properties to object function treeHTML(element, object) { object["type"] = element.nodeName; var nodeList = element.childNodes; if (nodeList != null) { if (nodeList.length) { object["content"] = []; for (var i = 0; i < nodeList.length; i++) { if (nodeList[i].nodeType == 3) { object["content"].push(nodeList[i].nodeValue); } else { object["content"].push({}); treeHTML(nodeList[i], object["content"][object["content"].length -1]); } } } } if (element.attributes != null) { if (element.attributes.length) { object["attributes"] = {}; for (var i = 0; i < element.attributes.length; i++) { object["attributes"][element.attributes[i].nodeName] = element.attributes[i].nodeValue; } } } } treeHTML(element, treeObject); return (json) ? JSON.stringify(treeObject) : treeObject; }
Working example: http://jsfiddle.net/JUSsf/ (Tested in Chrome, I can't guarantee full browser support - you will have to test this).
It creates an object that contains the tree structure of the HTML page in the format you requested and then uses JSON.stringify()
which is included in most modern browsers (IE8+, Firefox 3+ .etc); If you need to support older browsers you can include json2.js.
It can take either a DOM element or a string
containing valid XHTML as an argument (I believe, I'm not sure whether the DOMParser()
will choke in certain situations as it is set to "text/xml"
or whether it just doesn't provide error handling. Unfortunately "text/html"
has poor browser support).
You can easily change the range of this function by passing a different value as element
. Whatever value you pass will be the root of your JSON map.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With