Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does .ords not agree with .chars?

Tags:

unicode

raku

My understanding of .chars is that it returns "the number of characters in the string in graphemes". My understanding of .ords is that it returns "a list of codepoint numbers, one for the base character of each grapheme in the string". That is, .chars returns the number of graphemes and .ords returns one codepoint (the base) per grapheme. However, the behavior I am seeing in Rakudo 2016.07.1 on MoarVM 2016.07 doesn't seem to match that:

> "\x[2764]\x[fe0e]".chars
1
> "\x[2764]\x[fe0e]".ords.fmt("U+%04x")
U+2764 U+fe0e
> "e\x[301]".ords.fmt("U+%04x")
U+00e9
> "0\x[301]".ords.fmt("U+%04x")
U+0030

The .chars method returns the expect 1 for the HEAVY BLACK HEART and VARIATION SELECTOR-15 (text representation ❤︎ rather than emoji ❤️, U+2764 U+fe0f), but then .ords returns both codepoints rather than just the base (I expected just U+2764). Even more confusing, if you call .ords on LATIN SMALL LETTER E and COMBINING ACUTE ACCENT, you get back U+00e9 (LATIN SMALL LETTER E WITH ACUTE). I was expecting U+0065 as LATIN SMALL LETTER E is the base codepoint. I do get back the expected result when there isn't an NFC version of the string (eg U+0030 for 0́).

Is my understanding of .chars and .ords just flawed, or is this a bug?

like image 529
Chas. Owens Avatar asked Sep 20 '16 20:09

Chas. Owens


1 Answers

Documentation bug regarding the .ords method. One of the core developers has just updated the docs with this commit:

https://github.com/perl6/doc/commit/12ec5fc35e

Which should appear on the site shortly.

like image 160
Coke Avatar answered Oct 19 '22 15:10

Coke