Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper NFD form of emoji and comparison

Tags:

unicode

nfd

Given that there is now a selector for textual vs emoji display for some codepoints, what is the proper decomposed form of those codepoints? For instance, ❤︎ (U+2764) defaults to a text representation, but can become an emoji if followed by VS-16 (U+fe0f): ❤️. You can force a text representation with VS-15 (U+fe0e). Does this mean the NFD for U+2764 should become U+2764 U+fe0e? Should U+2764 U+fe0e and U+2764 be treated as the same (in the same way é (U+00e9) is the same as é (U+0065 U+0301))? What about the text vs emoji representations? Should they be treated the same as well?

like image 372
Chas. Owens Avatar asked Nov 25 '25 08:11

Chas. Owens


1 Answers

There's no decomposition mapping in the Unicode database for emojis and variation selectors. The standard even states:

The initial character in a variation sequence is never [...] a canonical decomposable character.

This means that emojis with or without variation selector don't change under NFD.

Also, to my knowledge, Unicode doesn't specify the default representation of a code point without variation selector. This is up to the implementation.

like image 98
nwellnhof Avatar answered Nov 27 '25 22:11

nwellnhof



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!