Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use DTD to Define an Element as CDATA?

In short, is it possible to use a DTD to define an element as containing CDATA?

I'm calling a third party API that produces some invalid characters inside an element. Specifically, the data contains some HTML entities like ’. When I attempt to parse this XML using SimpleXML, I of course get a parser error "Entity 'rsquo' not defined". Here's a simplistic example structure of what I'm dealing with:

<items>
    <item>
        <name>Jim Smith</name>
        <description>Jim&rsquo;s description breaks my parser</description>
    </item>
</items>

Since I don't have control to fix the API response... I've resorted to this dirty trick to inject a CDATA section inside the problem element just before I try to parse it:

$xml = str_replace("<description>", "<description><![CDATA[", $xml);
$xml = str_replace("</description>", "]]></description>", $xml);

This fixes the issue for me, but the overhead is probably too big, don't you think? The XML can be anywhere between 30K to 100K of data.

I'd rather use a DTD but for the life of me I can't find any specs that allow for defining CDATA (in the same way I can define PCDATA). Below is what I'd like to do, but of course, it's invalid because of the '#CDATA' definition I'm trying to do:

<!DOCTYPE ITEMS [
    <!ELEMENT ITEMS (ITEM)>
    <!ELEMENT ITEM (NAME, DESCRIPTION)>
    <!ELEMENT NAME (#PCDATA)>
    <!ELEMENT DESCRIPTION (#CDATA)>
]>

Thanks for any insights!

like image 488
Jared Cobb Avatar asked Oct 16 '25 16:10

Jared Cobb


1 Answers

It is possible in SGML DTDs (e.g. the HTML 4.01 script element), but not in XML DTDs (hence the change for XHTML 1.0).

like image 107
Quentin Avatar answered Oct 18 '25 08:10

Quentin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!