Obviously it is true for the latin alphabet. But I'm asking this in a conceptual sense, across languages and the Unicode spec.
Practically this came up for comparing two strings. If you already know they aren't the same number of bytes—across all languages—can you consider that enough of a guarantee that they are not differently "cased" versions of the same string?
No.
Consider U+0069 "i" which has the octet value 69
in UTF-8. In the uppercase form U+0130 "İ" this code point forms the UTF-8 sequence C4 B0
.
Obligatory note: case is locale-sensitive.
There is no principle or invariant in the Unicode standard that guarantees this. I would be particularly concerned about accented capitals, where there may be a mismatch between precomposition and non-precomposition across cases. However, I can't cite an example of a problem for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With