innerHTML unencodes

Question

I have an HTML document that might have < and > in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that < is not valid inside of an attribute.

I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML, the DOM is unencoding the attributes. Strangely, it does this for < and >, but not some others like &.

Here is a simple example:

var div = document.createElement('DIV');
div.innerHTML = '<div asdf="&lt;50" fdsa="&amp;50"></div>';
console.log(div.innerHTML)

I'm assuming that the DOM implementation decided that HTML attributes can be less strict than XML attributes, and that this is "working as intended". My question is, can I work around this without writing some horrible regex replacement?

Martin Honnen · Accepted Answer

Try XMLSerializer:

var div = document.getElementById('d1');

var pre = document.createElement('pre');
pre.textContent = div.outerHTML;
document.body.appendChild(pre);

pre = document.createElement('pre');
pre.textContent = new XMLSerializer().serializeToString(div);
document.body.appendChild(pre);

<div id="d1" data-foo="a &lt; b &amp;&amp; b &gt; c">This is a test</div>

You might need to adapt the XSLT to take account of the XHTML namespace XMLSerializer inserts (at least here in a test with Firefox).

innerHTML unencodes < in attributes

Tags:

javascript

html

xml

innerhtml

xslt

murrayju

1 Answers

Martin Honnen

Recent Activity

Donate For Us

innerHTML unencodes &lt; in attributes