My understanding of .chars
is that it returns "the number of characters in the string in graphemes". My understanding of .ords
is that it returns "a list of codepoint numbers, one for the base character of each grapheme in the string". That is, .chars
returns the number of graphemes and .ords
returns one codepoint (the base) per grapheme. However, the behavior I am seeing in Rakudo 2016.07.1 on MoarVM 2016.07 doesn't seem to match that:
> "\x[2764]\x[fe0e]".chars
1
> "\x[2764]\x[fe0e]".ords.fmt("U+%04x")
U+2764 U+fe0e
> "e\x[301]".ords.fmt("U+%04x")
U+00e9
> "0\x[301]".ords.fmt("U+%04x")
U+0030
The .chars
method returns the expect 1 for the HEAVY BLACK HEART and VARIATION SELECTOR-15 (text representation ❤︎ rather than emoji ❤️, U+2764 U+fe0f), but then .ords
returns both codepoints rather than just the base (I expected just U+2764). Even more confusing, if you call .ords
on LATIN SMALL LETTER E and COMBINING ACUTE ACCENT, you get back U+00e9 (LATIN SMALL LETTER E WITH ACUTE). I was expecting U+0065 as LATIN SMALL LETTER E is the base codepoint. I do get back the expected result when there isn't an NFC version of the string (eg U+0030 for 0́).
Is my understanding of .chars
and .ords
just flawed, or is this a bug?
Documentation bug regarding the .ords method. One of the core developers has just updated the docs with this commit:
https://github.com/perl6/doc/commit/12ec5fc35e
Which should appear on the site shortly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With