Given that there is now a selector for textual vs emoji display for some codepoints, what is the proper decomposed form of those codepoints? For instance, ❤︎ (U+2764) defaults to a text representation, but can become an emoji if followed by VS-16 (U+fe0f): ❤️. You can force a text representation with VS-15 (U+fe0e). Does this mean the NFD for U+2764 should become U+2764 U+fe0e? Should U+2764 U+fe0e and U+2764 be treated as the same (in the same way é (U+00e9) is the same as é (U+0065 U+0301))? What about the text vs emoji representations? Should they be treated the same as well?
There's no decomposition mapping in the Unicode database for emojis and variation selectors. The standard even states:
The initial character in a variation sequence is never [...] a canonical decomposable character.
This means that emojis with or without variation selector don't change under NFD.
Also, to my knowledge, Unicode doesn't specify the default representation of a code point without variation selector. This is up to the implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With