Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the order in which combining diacritic marks appear after a codepoint important?

I wonder if the order in which combining diacritic marks appear after a codepoint changes the way how the diacritics should be stacked above or below the character; or if there is another semantic difference.

Does normalization specify some way to reorder diacritics, e. g. to speed up String comparison?

like image 567
soc Avatar asked May 31 '11 09:05

soc


2 Answers

According to this Wikipedia article the order of combining characters is relevant in some cases and should be normalized as specified in other cases.

Concretely the order of combining characters with the same combining class must be preserved (i.e. it is relevant), while the groups of characters must be sorted by their combining class.

like image 153
Joachim Sauer Avatar answered Nov 02 '22 04:11

Joachim Sauer


Yes, it's important, and it has to be in order to make some cases unambiguous:

  • Normal form D: U, U+0308, U+0304 -> Normal form C U+01D6 Latin Small Letter U With Diaeresis And Macron ǖ

  • Normal form D: U, U+0304, U+0308 -> Normal form C U+1E7B Latin Small Letter U With Macron And Diaeresis

In general within a combining class you start closer to the letter and work away from it.

like image 22
bobince Avatar answered Nov 02 '22 05:11

bobince