I have the following XML code.
<firstname>
<default length="6">Örwin</default>
<short>Örwin</short>
<shorter>Örwin</shorter>
<shortest>�.</shortest>
</firstname>
Why does the content of the "shortest" node break? It should be a simple "Ö" instead of the tedious �. XML is UTF-8 encoded and the function which processes the output of that node also writes the content of "short" and "shorter". Where the "Ö" is clearly visible.
My guess is that the XML isn't properly UTF-8 encoded. Please show the bytes within the <shortest>
element in the raw file... I suspect you'll find they're not a validly encoded character. If you could show a short but complete program which generates this XML from valid input, that would be very helpful. (Preferably saying which platform it is, too :)
EDIT: Something very odd is going on in this file. Here are the hex values for the "shorter" and "shortest" values:
Shorter: C3 96 72 77 69 63
Shortest: EF BF BD 2E
Now "C3 96" is the valid UTF-8 encoding for U+00D6 which is "Latin capital letter O with diaeresis" as you want.
However, EF BF BD is the UTF-8 encoding for U+FFFD which is "replacement character" - definitely not what you want. (The 2E is just the ASCII dot.)
So, this is actually valid UTF-8 - but it doesn't contain the characters you want. Again, you should examine what created the file...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With