Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

innerHTML converts CDATA to comments

I'm trying to insert some HTML into a page using javascript, and the HTML I'm inserting contains CDATA blocks.

I'm finding, in Firefox and Chrome, that the CDATA is getting converted to a comment.

The HTML is not under my control, so it's difficult for me to avoid using CDATA.

The following test case, when there is a div on the page with id "test":

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'

causes the following HTML to be appeded to the 'test' div:

<!--[CDATA[foo]]--> bar

Is there any way I can insert, verbatim, HTML containing CDATA into a document using javascript?

like image 763
Rich Avatar asked Aug 15 '11 13:08

Rich


People also ask

How do I comment on CDATA?

It is not possible to have comment inside a CDATA section. In XML, and hence in HTML when using XHTML syntax, a CDATA section is a used “to escape blocks of text containing characters which would otherwise be recognized as markup”. It has simple syntax: it begins with <! [CDATA[ and ends with the ]]> .

Is CDATA deprecated?

Note: CDATA is now deprecated. Do not use. The CDATA Section interface is used within XML for including extended portions of text. This text is unescaped text, like < and & symbols.

Does HTML support CDATA?

[CDATA[ … ]]> The only sequence which is not allowed within a CDATA section is the closing sequence of a CDATA section itself, ]]> . Note: CDATA sections should not be used within HTML they are considered as comments and not displayed.

What is the use of CDATA in Javascript?

A CDATA section contains text that will NOT be parsed by a parser. Tags inside a CDATA section will NOT be treated as markup and entities will not be expanded. The primary purpose is for including material such as XML fragments, without needing to escape all the delimiters.


1 Answers

document.createCDATASection should do it, but the real answer to your question is that although HTML 5 does have CDATA sections cross-browser support for them is pretty spotty.

EDIT

The CDATA sections just aren't in the HTML 4 definition, so most browsers won't recognize them.

But it doesn't require a full DOM parser. Here's a simple lexical solution that will fix the problem.

function htmlWithCDATASectionsToHtmlWithout(html) {
    var ATTRS = "(?:[^>\"\']|\"[^\"]*\"|\'[^\']*\')*",
        // names of tags with RCDATA or CDATA content.
        SCRIPT = "[sS][cC][rR][iI][pP][tT]",
        STYLE = "[sS][tT][yY][lL][eE]",
        TEXTAREA = "[tT][eE][xX][tT][aA][rR][eE][aA]",
        TITLE = "[tT][iI][tT][lL][eE]",
        XMP = "[xX][mM][pP]",
        SPECIAL_TAG_NAME = [SCRIPT, STYLE, TEXTAREA, TITLE, XMP].join("|"),
        ANY = "[\\s\\S]*?",
        AMP = /&/g,
        LT = /</g,
        GT = />/g;
    return html.replace(new RegExp(
        // Entities and text
        "[^<]+" +
        // Comment
        "|<!--"+ANY+"-->" +
        // Regular tag
        "|<\/?(?!"+SPECIAL_TAG_NAME+")[a-zA-Z]"+ATTRS+">" +
        // Special tags
        "|<\/?"+SCRIPT  +"\\b"+ATTRS+">"+ANY+"<\/"+SCRIPT  +"\\s*>" +
        "|<\/?"+STYLE   +"\\b"+ATTRS+">"+ANY+"<\/"+STYLE   +"\\s*>" +
        "|<\/?"+TEXTAREA+"\\b"+ATTRS+">"+ANY+"<\/"+TEXTAREA+"\\s*>" +
        "|<\/?"+TITLE   +"\\b"+ATTRS+">"+ANY+"<\/"+TITLE   +"\\s*>" +
        "|<\/?"+XMP     +"\\b"+ATTRS+">"+ANY+"<\/"+XMP     +"\\s*>" +
        // CDATA section.  Content in capturing group 1.
        "|<!\\[CDATA\\[("+ANY+")\\]\\]>" +
        // A loose less-than
        "|<", "g"),

        function (token, cdataContent) {
          return "string" === typeof cdataContent
              ? cdataContent.replace(AMP, "&amp;").replace(LT, "&lt;")
                .replace(GT, "&gt;")
              : token === "<"
              ? "&lt;"  // Normalize loose less-thans.
              : token;
        });
}

Given

<b>foo</b><![CDATA[<i>bar</i>]]>

it produces

<b>foo</b>&lt;i&gt;bar&lt;/i&gt;

and given something that looks like a CDATA section inside a script or other special tag or comment, it correctly does not muck with it:

<script>/*<![CDATA[*/foo=bar<baz&amp;//]]></script><![CDATA[fish: <><]]>

becomes

<script>/*<![CDATA[*/foo=bar<baz&amp;//]]></script>fish: &lt;&gt;&lt;
like image 70
Mike Samuel Avatar answered Sep 30 '22 12:09

Mike Samuel