Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate CDATA section for an XML in PHP

Tags:

php

xml

I create an XML based on user input. One of the xml nodes has a CDATA section. If one of the characters inserted in the CDATA section is 'special' (a control character I think) then the entire xml becomes invalid.

Example:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->appendChild($dom->createElement('root'))
    ->appendChild($dom->createCDATASection(
        "This is some text with a SOH char \x01."
    ));

$test = new DOMDocument;
$test->loadXml($dom->saveXML());
echo $test->saveXml();

will give

Warning: DOMDocument::loadXML(): CData section not finished
This is some text with a SOH cha in Entity, line: 2 in /newfile.php on line 17

Warning: DOMDocument::loadXML(): PCDATA invalid Char value 1 in Entity, line: 2 in /newfile.php on line 17

Warning: DOMDocument::loadXML(): Sequence ']]>' not allowed in content in Entity, line: 2 in /newfile.php on line 17

Warning: DOMDocument::loadXML(): Sequence ']]>' not allowed in content in Entity, line: 2 in /newfile.php on line 17

Warning: DOMDocument::loadXML(): internal errorExtra content at the end of the document in Entity, line: 2 in /newfile.php on line 17
<?xml version="1.0"?>

Is there a good way in php do make sure the CDATA section is valid ?

like image 901
johnlemon Avatar asked Dec 21 '22 05:12

johnlemon


1 Answers

The allowed range of characters for CDATA section is

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

So you have to sanitize your string to include only those characters.

like image 100
Gordon Avatar answered Dec 24 '22 01:12

Gordon