Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Map HTML to JSON [closed]

I'm attempting map HTML into JSON with structure intact. Are there any libraries out there that do this or will I need to write my own? I suppose if there are no html2json libraries out there I could take an xml2json library as a start. After all, html is only a variant of xml anyway right?

UPDATE: Okay, I should probably give an example. What I'm trying to do is the following. Parse a string of html:

<div>   <span>text</span>Text2 </div> 

into a json object like so:

{   "type" : "div",   "content" : [     {       "type" : "span",       "content" : [         "Text2"       ]     },     "Text2"   ] } 

NOTE: In case you didn't notice the tag, I'm looking for a solution in Javascript

like image 993
Julian Krispel-Samsel Avatar asked Oct 19 '12 18:10

Julian Krispel-Samsel


People also ask

Can you convert HTML to JSON?

HTML to JSON Converter is used to convert HTML document to JSON by extracting the rows from HTML tables & converting it to JSON format. HTML is parsed, data types are automatically detected & converted to appropriate format in the JSON output. And finally the JSON output is formatted & indented for easy viewing.

Does map work on JSON?

You can map the data types of your business model into JSON by using the examples. Data in JSON is either an object or an array. A JSON object is an unordered collection of names and values.

Can AWK parse JSON?

Unix/Linux tools come natively with a host of shell utilities that one can use for parsing out the desired name/value pairs. Tools include sed, awk, cut, tr, and grep, to name a few. System administrators use these utilities frequently and may be able to assist with the methods for parsing JSON strings.


1 Answers

I just wrote this function that does what you want; try it out let me know if it doesn't work correctly for you:

// Test with an element. var initElement = document.getElementsByTagName("html")[0]; var json = mapDOM(initElement, true); console.log(json);  // Test with a string. initElement = "<div><span>text</span>Text2</div>"; json = mapDOM(initElement, true); console.log(json);  function mapDOM(element, json) {     var treeObject = {};          // If string convert to document Node     if (typeof element === "string") {         if (window.DOMParser) {               parser = new DOMParser();               docNode = parser.parseFromString(element,"text/xml");         } else { // Microsoft strikes again               docNode = new ActiveXObject("Microsoft.XMLDOM");               docNode.async = false;               docNode.loadXML(element);          }          element = docNode.firstChild;     }          //Recursively loop through DOM elements and assign properties to object     function treeHTML(element, object) {         object["type"] = element.nodeName;         var nodeList = element.childNodes;         if (nodeList != null) {             if (nodeList.length) {                 object["content"] = [];                 for (var i = 0; i < nodeList.length; i++) {                     if (nodeList[i].nodeType == 3) {                         object["content"].push(nodeList[i].nodeValue);                     } else {                         object["content"].push({});                         treeHTML(nodeList[i], object["content"][object["content"].length -1]);                     }                 }             }         }         if (element.attributes != null) {             if (element.attributes.length) {                 object["attributes"] = {};                 for (var i = 0; i < element.attributes.length; i++) {                     object["attributes"][element.attributes[i].nodeName] = element.attributes[i].nodeValue;                 }             }         }     }     treeHTML(element, treeObject);          return (json) ? JSON.stringify(treeObject) : treeObject; } 

Working example: http://jsfiddle.net/JUSsf/ (Tested in Chrome, I can't guarantee full browser support - you will have to test this).

​It creates an object that contains the tree structure of the HTML page in the format you requested and then uses JSON.stringify() which is included in most modern browsers (IE8+, Firefox 3+ .etc); If you need to support older browsers you can include json2.js.

It can take either a DOM element or a string containing valid XHTML as an argument (I believe, I'm not sure whether the DOMParser() will choke in certain situations as it is set to "text/xml" or whether it just doesn't provide error handling. Unfortunately "text/html" has poor browser support).

You can easily change the range of this function by passing a different value as element. Whatever value you pass will be the root of your JSON map.

like image 118
George Reith Avatar answered Sep 29 '22 12:09

George Reith