I wonder if the order in which combining diacritic marks appear after a codepoint changes the way how the diacritics should be stacked above or below the character; or if there is another semantic difference.
Does normalization specify some way to reorder diacritics, e. g. to speed up String comparison?
According to this Wikipedia article the order of combining characters is relevant in some cases and should be normalized as specified in other cases.
Concretely the order of combining characters with the same combining class must be preserved (i.e. it is relevant), while the groups of characters must be sorted by their combining class.
Yes, it's important, and it has to be in order to make some cases unambiguous:
Normal form D: U
, U+0308, U+0304 -> Normal form C U+01D6 Latin Small Letter U With Diaeresis And Macron ǖ
Normal form D: U
, U+0304, U+0308 -> Normal form C U+1E7B Latin Small Letter U With Macron And Diaeresis ṻ
In general within a combining class you start closer to the letter and work away from it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With