Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET XmlDocument LoadXML and Entities

Tags:

c#

xml

entity

When loading XML into an XmlDocument, i.e.

XmlDocument document = new XmlDocument();
document.LoadXml(xmlData);

is there any way to stop the process from replacing entities? I've got a strange problem where I've got a TM symbol (stored as the entity #8482) in the xml being converted into the TM character. As far as I'm concerned this shouldn't happen as the XML document has the encoding ISO-8859-1 (which doesn't have the TM symbol)

Thanks

like image 972
Gordon Thompson Avatar asked Jan 25 '23 02:01

Gordon Thompson


1 Answers

This is a standard misunderstanding of the XML toolset. The whole business with "&#x", is a syntactic feature designed to cope with character encodings. Your XmlDocument isn't a stream of characters - it has been freed of character encoding issues - instead it contains an abstract model of XML type data. Words for this include DOM and InfoSet, I'm not sure exactly which is accurate.

The "&#x" gubbins won't exist in this model because the whole issue is irrelevant, it will return - if appropriate - when you transform the Info Set back into a character stream in some specific encoding.

This misunderstanding is sufficiently common to have made it into academic literature as part of a collection of similar quirks. Take a look at "Xml Fever" at this location: http://doi.acm.org/10.1145/1364782.1364795

like image 107
Simon Gibbs Avatar answered Jan 31 '23 04:01

Simon Gibbs