Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten nested spans in DOM using JavaScript to refine HTML editor output

I need to use JavaScript to reformat input HTML so that the resulting output HTML is always a sequence of <p> nodes containing only one or more <span> nodes and each <span> node should contain exactly one #text node.

To provide an example, I'd like to convert HTML which looks like this:

<p style="color:red">This is line #1</p>
<p style="color:blue"><span style="color:yellow"><span style="color:red">This is</span> line #2</span></p>
<p style="color:blue"><span style="color:yellow"><span style="color:green">This is line #3</span></span>
<p style="color:blue"><span style="color:yellow">This is</span><span style="color:red">line #4</span></span></p>

To HTML which looks like this:

<p style="color:red"><span style="color:red">This is line #1</span></p>
<p style="color:red"><span style="color:red">This is</span><span style="color:yellow"> line #2</span></p>
<p style="color:green"><span style="color:red">This is line #3</span>
<p style="color:yellow"><span style="color:yellow">This is</span><span style="color:red">line #4</span></span></p>

Additional, somewhat tangential information:

  • The text is within a TinyMCE editor. The HTML needs to conform to this pattern to make the application more usable and to provide a PDF output engine with usable HTML (wkhtmltopdf has line height issues if the HTMl gets too complex and nested spans cause editing in TinyMCE to be non-intuitive)
  • jQuery is not available. Prototype.JS is available in the parent window but not directly in this document. I'm capable of reformatting jQuery code to pure JavaScript myself but can't actually use jQuery in this instance :-(
  • Yes, I have existing code. The logic is clearly so horribly wrong that it's not worth sharing right now. I'm working to improve it right now and will share it if I can get it even reasonably close so it would be useful
  • I really do know what I'm doing! I've just been staring at this code too long and so the proper algorithm to use is evading me right now...

Additional, halfway finished, nonfunctional code I am still playing with, to mitigate downvotes:

function reformatChildNodes(node) {
    var n,l,parent;
    if(node.nodeName.toLowerCase() == 'p') {
        // We are on a root <p> node, make that it has at least one child span node:
        if(!node.childNodes.length) {
            var newSpan = document.createElement('span');
            /* set style on newSpan here */
            node.appendChild(newSpan);
        }
        if(node.childNodes[0].nodeName.toLowerCase() != 'span') {
            // First child of the <p> node is not a span, so wrap it in one:
            var newSpan = document.createElement('span');
            /* set style on newSpan here */
            newSpan.appendChild(node.childNodes[0]);
            node.appendChild(newSpan);
        }
        // Now repeat for each child node of the <p> and make sure they are all <span> nodes:
        for(n=0;n<node.childNodes.length;++n)
            reformatChildNodes(node.childNodes[n]);
    } else if(node.nodeName.toLowerCase() == 'span') {
        // We are on a <span> node, make that it has only a single #text node
        if(!node.childNodes.length) {
            // This span has no children! it should be removed...
        } else if(node.parentNode.nodeName.toLowerCase() != 'p') {
            // We have a <span> that's not a direct child of a <p>, so we need to reformat it:
            node.parentNode.parentNode.insertBefore(node, parent);
        } else {
            for(n=0;n<node.childNodes.length;++n)
                reformatChildNodes(node.childNodes[n]);
        }
    } else if(node.nodeName.toLowerCase() == 'div') {
        // This is justa  dirty hack for this example, my app calls reformatChildNodes on all nodes
        for(n=0;n<node.childNodes.length;++n)
            reformatChildNodes(node.childNodes[n]);
    }
}
like image 450
Josh Avatar asked Sep 19 '13 20:09

Josh


1 Answers

This solution runs over the spans, unwrapping them (where necessary) and then continuing with the just unwrapped elements so that it handles all of them. Left are only top-level spans with text node children.

function wrap(text, color) {
   var span = document.createElement("span");
   span.style.color = color;
   span.appendChild(text);
   return span;
}
function format(p) {
    for (var cur = p.firstChild; cur != null; cur = next) {
        var next = cur.nextSibling;
        if (cur.nodeType == 3) {
            // top-level text nodes are wrapped in spans
            next = p.insertBefore(wrap(cur, p.style.color), next);
        } else {
            if (cur.childNodes.length == 1 && cur.firstChild.nodeType == 3)
               continue;
            // top-level spans are unwrapped…
            while (cur.firstChild) {
                if (cur.firstChild.nodeType == 1)
                    // with nested spans becoming unnested
                    p.insertBefore(cur.firstChild, next);
                else
                    // and child text nodes becoming wrapped again
                    p.insertBefore(wrap(cur.firstChild, cur.style.color), next);
            }
            // now empty span is removed
            next = cur.nextSibling;
            p.removeChild(cur);
        }
    }
    p.style.color = p.firstChild.style.color;
}

(Demo at jsfiddle.net)

like image 166
Bergi Avatar answered Oct 03 '22 09:10

Bergi