I'm reading Programming in Lua, 1st edition (yup, I know it's a bit outdated), and in the section 3.2 (about relational operators), the author says:
For instance, with the European Latin-1 locale, we have "acai" < "açaí" < "acorde".
I don't get it. For me, it's OK to have "acai" < "açaí"
, but why is "açaí" < "acorde"
?
AFAIK (and wikipedia seems to confirm), "c" < "ç"
, or am I wrong?
Note that, for Lua, arrays also have no order.
You can initialize strings in Lua in three ways: Use single quotes. Use double quotes. Enclose text between [[ and ]]
In lua '==' for string will return true if contents of the strings are equal. As it was pointed out in the comments, lua strings are interned, which means that any two strings that have the same value are actually the same string.
One of the most used functions in Lua is the sort function which is provided by the Lua library which tables a table as an argument and sorts the values that are present inside the table. The sort function also takes one more argument with the table and that argument is a function which is known as the order function.
In the third edition of PiL, this statement has been modified:
For instance, with a Portuguese Latin-1 locale, we have
"acai"<"açaí"<"acorde"
.
So the locale needs to be set to Portuguese Latin-1 accordingly:
print("acai" < "açaí")
print("açaí" < "acorde")
print(os.setlocale("pt_PT"))
print("acai" < "açaí")
print("açaí" < "acorde")
On ideone, the result is:
true
false
pt_PT.iso88591
false
true
But the order of "acai"
and "açaí"
seems to be different from the book now.
You reference a code page, which maps codepoints to characters. Certainly codepoints, being a finite set of non-negative integers, are well-ordered, distinct entities. However, that is not what characters are about.
Characters have a collation order, which is a partial ordering: Characters can be "equal" but not the same. Collation is a user-valued concept that varies by locale (and over time).
Strings are even more complicated because some character sets (e.g. Unicode) can have combining characters. That allows a "character" to be represented as a single character or as a base character followed by the combining characters. For example, "ä" vs "a¨". Since they represent the same conceptual character they should be considered even more equal than "ä" vs "a".
In Spanish, "ch", "rr" and "ll" used to be letters in the alphabet and words were ordered accordingly; Now, they are not but "ñ" still is.
Similarly, in the past it was not uncommon for English-speakers to sort surnames beginning with "Mc" and "Mac" after others beginning with "M".
Software libraries have to deal with such things because that's what users want. Thankfully, some of the older conventions have fallen from use.
So, a locale could very well have collation rules that result in "acai" < "açaí" < "acorde" if "c" has the same sort order as "ç" but "i" comes before "í". This case seems strange though the possibility in general requires our code to allow it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With