I am referring to the XML 1.1 spec.
Look at the definition of NameStartChar
:
NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
If I interpret this correctly, the last range (#x10000-#xEFFFF
) goes beyond the UTF16 range of Java's char
type. So it must be UTF32, right? So, I need to check pairs of char
against this range, instead of single char
s, right?
My questions are:
\u10000
and \uEFFFF
Thank you!
NOTE: Don't worry, I am not trying to write an own XML-parser.
EDIT: I am writing a parser, which would check if text input from miscellaneous (non-XML) text formats would match valid XML names.
Have a look at Character.toCodePoint(char, char)
which will convert a surrogate pair into a full range code point. String.codePointAt
may well be useful to you, too.
There's a lot of other surrogate support within Character and String. To know exactly which methods to call, we'd need to know the exact details of your situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With