Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the CDATA syntax in XML look so strange?

Tags:

xml

cdata

CDATA is used in XML like:

<my-tag><![CDATA[my-data]]></my-tag>

It's quite an unusual syntax. When I first saw it, I assumed it was a specific form of some general XML construct I had yet to learn. But, as far as I can tell (XML CDATA spec) it isn't.

My question: Is there a reason why the CDATA section looks like it does, e.g. is i a special case of some SGML thing? Or did some language designer just think one day "I'll make a CDATA section with a bracket before CDATA, a bracket afterwards, an exclamation mark, surrounded by angle brackets."

like image 923
Adrian Smith Avatar asked Jan 14 '23 22:01

Adrian Smith


1 Answers

The CDATA section is a marked section. In SGML there is both an abstract syntax as well as a concrete syntax. The abstract syntax of a marked section declaration begins with a markup declaration open (mdo) delimiter followed by a declaration subset open (dso) delimiter. A status keyword comes next followed by a second declaration subset open (dso) delimiter. A marked section ends with a marked section close (msc) delimiter followed by a markup declaration close (mdc) delimiter. Therefore the abstract syntax of a marked section declaration is:

mdo dso status-keyword dso my-data msc mdc

A concrete syntax is defined for each document. This syntax is specified within the SGML declaration associated with each document. The concrete syntax defines the delimiters to be used for the document. The default SGML delimiters, which I assume are defined in ISO 8879:1986, are as follows:

  • Markup declaration open: <!
  • Declaration subset open: [
  • Marked section close: ]]
  • Markup declaration close: >

But you are free to define your own concrete syntax and so can modify the characters used as the delimiters.

Therefore the default concrete syntax of a marked section declaration is:

<![ status-keyword [my-data]]>

Possible status-keywords are: CDATA, RCDATA, IGNORE, INCLUDE, TEMP

Which brings us to:

<![ CDATA [my-data]]>

See the following chapters from the book SGML and HTML Explained by Martin Bryan:

  • The SGML Declaration
  • Marked Sections and Processing Instructions
like image 84
chris Avatar answered Jan 17 '23 07:01

chris