Is there a way to handle unicode characters like \u0016 in XML? As per my understanding, loading such characters in XMLDocument throws an invalid hexadecimal character error. I tried with other unicode characters. They seem to work fine. Only the control characters cause this error. Can we remove these characters without actual parsing the XML?
Characters are denoted using the notation used in the Unicode Standard, that is, an optional U+ followed by their hexadecimal number, using at least 4 digits, such as "U+1234" or "U+10FFFD". In XML or HTML this could be expressed as "ሴ" or "􏿽".
XML does not support certain Unicode characters (the NUL character, anything in XML's RestrictedChar category, and permanently undefined Unicode characters). However, you can accidentally send them through the REST API. For more information about these characters, go to section 2.2 of the XML 1.1 specification .
The unicode is and it's being used in an XML document. That's not unicode, it's a numeric character entity.
Unicode provides a unique number for every character including punctuation marks, mathematical symbols, technical symbols, arrows, and characters making up non-Latin alphabets such as Thai, Chinese, or Arabic script.
Characters are denoted using the notation used in the Unicode Standard, that is, an optional U+ followed by their hexadecimal number, using at least 4 digits, such as
U+1234
orU+10FFFD
. InXML
or HTML this could be expressed asሴ
or􏿽
.
from Unicode Technical Report.
Valid characters in XML:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
from Extensible Markup Language (XML) 1.0 (Fifth Edition)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With