Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between PCDATA and CDATA in DTD

Tags:

xml

dtd

What is the difference between #PCDATA and #CDATA in DTD?

like image 792
Jakub Arnold Avatar asked May 27 '09 23:05

Jakub Arnold


People also ask

What is the difference between PCDATA and CDATA?

CDATA means the element contains character data that is not supposed to be parsed by a parser. #PCDATA means that the element contains data that IS going to be parsed by a parser. The keyword ANY declares an element with any content. If a #PCDATA section contains elements, these elements must also be declared.

How PCDATA and CDATA applies in XML?

PCDATA is parsed which means that entities are expanded and that text is treated as markup. CDATA is not parsed by an XML parser.

What does PCDATA mean in XML?

PCDATA. PCDATA means parsed character data. Think of character data as the text found between the start tag and the end tag of an XML element. PCDATA is text that WILL be parsed by a parser.

What is meant by CDATA?

The term CDATA, meaning character data, is used for distinct, but related, purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.


1 Answers

  • PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.
  • CDATA is text that will not be parsed by a parser. Tags inside the text will not be treated as markup and entities will not be expanded.

By default, everything is PCDATA. In the following example, ignoring the root, <bar> will be parsed, and it'll have no content, but one child.

<?xml version="1.0"?> <foo> <bar><test>content!</test></bar> </foo> 

When we want to specify that an element will only contain text, and no child elements, we use the keyword PCDATA, because this keyword specifies that the element must contain parsable character data – that is , any text except the characters less-than (<) , greater-than (>) , ampersand (&), quote(') and double quote (").

In the next example, <bar> contains CDATA. Its content will not be parsed and is thus <test>content!</test>.

<?xml version="1.0"?> <foo> <bar><![CDATA[<test>content!</test>]]></bar> </foo> 

There are several content models in SGML. The #PCDATA content model says that an element may contain plain text. The "parsed" part of it means that markup (including PIs, comments and SGML directives) in it is parsed instead of displayed as raw text. It also means that entity references are replaced.

Another type of content model allowing plain text contents is CDATA. In XML, the element content model may not implicitly be set to CDATA, but in SGML, it means that markup and entity references are ignored in the contents of the element. In attributes of CDATA type however, entity references are replaced.

In XML, #PCDATA is the only plain text content model. You use it if you at all want to allow text contents in the element. The CDATA content model may be used explicitly through the CDATA block markup in #PCDATA, but element contents may not be defined as CDATA per default.

In a DTD, the type of an attribute that contains text must be CDATA. The CDATA keyword in an attribute declaration has a different meaning than the CDATA section in an XML document. In a CDATA section all characters are legal (including <,>,&,' and " characters), except the ]]> end tag.

#PCDATA is not appropriate for the type of an attribute. It is used for the type of "leaf" text.

#PCDATA is prepended by a hash in the content model to distinguish this keyword from an element named PCDATA (which would be perfectly legal).

like image 89
Rose Perrone Avatar answered Sep 19 '22 17:09

Rose Perrone