Why are "control" characters illegal in XML 1.0?

Tags:

There are a variety of characters that are not legally encodeable in XML 1.0, e.g. U+0007 ('bell') and U+001B ('escape'). Most of the interesting ones are non-whitespace 'control' characters.

It's clear from (e.g.) this question and others that it's the XML spec that's the issue -- but can anyone illuminate me as to why the XML spec forbids these characters?

It seems like it could have been required that they be encoded in escapes, e.g. as  and  respectively, but perhaps there's a practical reason that the characters were forbidden rather than required to be escaped?

Answerers have suggested that there is some motivation towards avoiding transmission control characters, but Unicode includes many other control-like characters (consider U+200C "zero width non joiner"). I recognize there may be no good reason for this behavior, but I would still like to understand it better.

It's particularly frustrating because when those character values appear in other ~~encodings~~ data formats, I end up "double-escaping" new XML documents that need to encode this.

717

asked Dec 31 '08 21:12

Trochee

2 Answers

My understanding is that this range is barred on the grounds that a markup language should not have any need to support transmission and flow control characters and including them would create a problem for any editors and parsers in binary conversion.

I'm struggling to find anything ex cathedra on this from Tim Bray et al though.

edit: some discussion of control chars and a vague admission it wasn't exactly over-engineered:

At 09:27 AM 17/06/00 -0500, Mark Volkmann wrote:

I've never seen a discussion of the reason why most ASCII control characters, such as a form feed, are not allowed in XML documents. Can anyone tell me the reason behind that decision or point me to a spec. that explains that?

I'm not sure we'd do it the same way if we were doing it again. I don't see that they do any real harm. Clearly, if you're optimizing for a highly interoperable content markup language (and XML is) it's legitimate to be suspicious of things like vertical-tab and backspace and so on... but then how can it be consistent to leave in \n and DEL and so on? -Tim

170

answered Oct 11 '22 16:10

annakata

It seems like it could have been required that they be encoded in escapes, e.g. as  and 

You can do exactly that in XML 1.1, for all but \0.

answered Oct 11 '22 16:10

bobince

Related questions
                            
                                Why do I get the error "Xml data type is not supported in distributed queries" when querying a linked server for non-xml data?
                            
                                How to fetch XML with fetch api
                            
                                In XML, is order important?
                            
                                How to change XML Attribute
                            
                                How to do opposite of of preference attribute android:dependency?
                            
                                Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string
                            
                                How do I capture PHP output into a variable?
                            
                                XSL xsl:template match="/"
                            
                                How do I see the actual XML generated by PHP SOAP Client Class?
                            
                                Serialize Python dictionary to XML [closed]
                            
                                Parsing XML in Python using ElementTree example
                            
                                TabLayout without using ViewPager
                            
                                What does i:nil="true" mean?
                            
                                Loading System.ServiceModel configuration section using ConfigurationManager
                            
                                Why does C# XmlDocument.LoadXml(string) fail when an XML header is included?
                            
                                XMLReader from a string content
                            
                                Using Xpath With Default Namespace in C#
                            
                                Android View Clipping
                            
                                Should full backup content xml file be empty or not added at all to include all?
                            
                                Cross-Browser Javascript XML Parsing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are "control" characters illegal in XML 1.0?

Tags:

xml

unicode

history

Trochee

People also ask

2 Answers

annakata

bobince

Recent Activity

Donate For Us