I have an XML document with a tag that contains a user entered message, I would like to avoid unnecessary escaping of characters.
According to the link below the only strictly illegal characters are "<" and "&".
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.
http://www.w3schools.com/xml/xml_syntax.asp
But in some parsers i encountered problems with the sequence ]]>, is this due to problems with the parsers or is it really defined as illegal somewhere in the XML-standard?
Example message:
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<message><!-- -- -- <![CDATA["TEST"]]></message>
<signature>Evil</signature>
</root>
As you can see < and & are escaped and this message is successfully parsed by C++ tinyxml and Java JAXB. Both Firefox 20.0.1 and IE 8.0 tell me
XML Parsing Error: not well-formed
and
The literal string ']]>' is not allowed in element content.
respectively.
Is this really a standard enforced behavior?
EDIT: Should have searched some more it seems, Legally use CDATA in XML. So I guess the XML parser in Firefox and IE are just broken?
From the XML spec (emphasis mine):
The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "
&
" and "<
" respectively. The right angle bracket (>) may be represented using the string ">
", and MUST, for compatibility, be escaped using either ">
" or a character reference when it appears in the string "]]>
" in content, when that string is not marking the end of a CDATA section.
This means as long as the ]]>
delimiter is not being used to mark the end of a CDATA section for use by the XML parser reading this document, it is not legal without being escaped, even if it isn't occurring within the context of a CDATA section.
I'm not familiar with the XML parsers used internally by browsers, but seeing as this requirement is in place for compatibility reasons, your guess seems sound.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With