Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading ° degree symbol from XML in Delphi 2010

The following XML cannot be read from Delphi because it contains an invalid ° symbol:

V1:   <Item Id="1" Description="90° Hinge"/>

It seems that Delphi does not recognise the "standard" way of doing this in XML:

V2:   <Item Id="1" Description="90&deg; Hinge"/>

Delphi does seem to handle this ok:

V3:   <Item Id="1" Description="90&#176; Hinge"/>

Since I'm getting the data from a RESTful Web Service, I don't particularly have control of the XML packets coming across, I just need to be able to read them.

Questions

  1. If V2 is the standard XML way of doing it, then why doesn't Delphi support this? Or is there a special way to handle this that I'm not aware of?
  2. Is the V1 XML badly formed to begin with? If so should I request that the RESTful interface be changed to export ° in V3 format.

Using Delphi 2010. Any help would be appreciated.

like image 971
Rick Wheeler Avatar asked Feb 22 '13 05:02

Rick Wheeler


3 Answers

Delphi itself doesn't parse the XML at all. A third party XML engine does, whether it be MSXML, OpenXML, AtomXML, etc. The TXMLDocument component and supporting interfaces are just a wrapper framework, the bulk of the parsing is done by someone else.

V1 may or may not be malformed. It depends on XML's actual charset.

V2 is actually not standard. Not all XML engines support it. Clearly, the one you are using with Delphi does not.

V3 is standardized, and all XML engines support that syntax.

like image 186
Remy Lebeau Avatar answered Oct 23 '22 15:10

Remy Lebeau


V1:   <Item Id="1" Description="90° Hinge"/>

Here you have directly encoded the character. Whether or not your code can parse this depends on the charset used by your XML document. So, if your XML document uses UTF-8 and is correctly encoded then your XML code will be able to parse it.

V2:   <Item Id="1" Description="90&deg; Hinge"/>

This uses a named entity, deg. In XML there are only five pre-defined named entities: quot, amp, apos, lt, gt. It is possible for an XML document to define other named entities, however that is unusual. So, it would seem that deg is not a valid named entity for your document.

V3:   <Item Id="1" Description="90&#176; Hinge"/>

This version uses a numeric character reference, NCR. You can use an NCR to specify any Unicode code point.


As to what you should do going forwards, we can immediately rule out the named entity. I would also recommend avoiding wholesale use of NCRs for all non-ASCII characters. That just leads to unreadable documents. Of course, if you must use a non-Unicode aware tool to process the document then using NCRs is the only approach.

So that leaves us with directly encoding non-ASCII characters. You should make sure that your XML is properly encoded using the UTF-8 charset and that approach will work well, and lead to readable and clean documents.

like image 45
David Heffernan Avatar answered Oct 23 '22 16:10

David Heffernan


Just elaborating on David's answer, XML doesn't rule out any value in a text node (except for very few reserved characters) as long as they are valid in the current encoding.

There are a few missing facts from your question:

  1. Are you producing this XML using a text editor? If this is true, then you must check what encoding are you using when saving the file. Try UTF-8. If your documents are produced using "windows" encoding then try adding an encoding attribute to the XML control tag, i.e., <?xml version="1.0" encoding="iso-8859-1"?>.

  2. Are you producing this XML using Delphi String functions? If this is the case, the encoding used by Delphi is by default UTF-8, but you can inadvertently mix it with other encodings if you are reading fragments from external sources. For this problem there is no silver bullet, except for using your XML library built-in functions to create XML.

When I have had to deal with these things (for XML signatures, no less!) I resorted to use wrappers for any string used, and use explicit encodings (I use type Latin1String = type AnsiString(28591).)

like image 1
Leonardo Herrera Avatar answered Oct 23 '22 14:10

Leonardo Herrera