When loading XML into an XmlDocument, i.e.
XmlDocument document = new XmlDocument(); document.LoadXml(xmlData);
is there any way to stop the process from replacing entities? I've got a strange problem where I've got a TM symbol (stored as the entity #8482) in the xml being converted into the TM character. As far as I'm concerned this shouldn't happen as the XML document has the encoding ISO-8859-1 (which doesn't have the TM symbol)
Thanks
This is a standard misunderstanding of the XML toolset. The whole business with "&#x", is a syntactic feature designed to cope with character encodings. Your XmlDocument isn't a stream of characters - it has been freed of character encoding issues - instead it contains an abstract model of XML type data. Words for this include DOM and InfoSet, I'm not sure exactly which is accurate.
The "&#x" gubbins won't exist in this model because the whole issue is irrelevant, it will return - if appropriate - when you transform the Info Set back into a character stream in some specific encoding.
This misunderstanding is sufficiently common to have made it into academic literature as part of a collection of similar quirks. Take a look at "Xml Fever" at this location: http://doi.acm.org/10.1145/1364782.1364795
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With