Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What characters do I need to escape in XML documents?

What characters must be escaped in XML documents, or where could I find such a list?

like image 406
Julius A Avatar asked Jul 07 '09 12:07

Julius A


People also ask

Do you need to escape in XML?

Escape String & These can be used within XML attributes, elements, text and processing instructions. It is good practice to always escape these characters when they appear in XML data, however this is not always required.

What characters are allowed in XML?

XML 1.0. Unicode code points in the following ranges are valid in XML 1.0 documents: U+0009, U+000A, U+000D: these are the only C0 controls accepted in XML 1.0; U+0020–U+D7FF, U+E000–U+FFFD: this excludes some (not all) non-characters in the BMP (all surrogates, U+FFFE and U+FFFF are forbidden);


1 Answers

If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.

XML escape characters

There are only five:

"   &quot; '   &apos; <   &lt; >   &gt; &   &amp; 

Escaping characters depends on where the special character is used.

The examples can be validated at the W3C Markup Validation Service.

Text

The safe way is to escape all five characters in text. However, the three characters ", ' and > needn't be escaped in text:

<?xml version="1.0"?> <valid>"'></valid> 

Attributes

The safe way is to escape all five characters in attributes. However, the > character needn't be escaped in attributes:

<?xml version="1.0"?> <valid attribute=">"/> 

The ' character needn't be escaped in attributes if the quotes are ":

<?xml version="1.0"?> <valid attribute="'"/> 

Likewise, the " needn't be escaped in attributes if the quotes are ':

<?xml version="1.0"?> <valid attribute='"'/> 

Comments

All five special characters must not be escaped in comments:

<?xml version="1.0"?> <valid> <!-- "'<>& --> </valid> 

CDATA

All five special characters must not be escaped in CDATA sections:

<?xml version="1.0"?> <valid> <![CDATA["'<>&]]> </valid> 

Processing instructions

All five special characters must not be escaped in XML processing instructions:

<?xml version="1.0"?> <?process <"'&> ?> <valid/> 

XML vs. HTML

HTML has its own set of escape codes which cover a lot more characters.

like image 169
Welbog Avatar answered Oct 13 '22 01:10

Welbog