Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean that string and character comparisons in Swift are not locale-sensitive?

Tags:

string

swift

I started learning Swift language and I am very curious What does it mean that string and character comparisons in Swift are not locale-sensitive? Does it mean that all the characters are stored in Swift like UTF-8 characters?

like image 679
Dmytro Plekhotkin Avatar asked Sep 07 '14 19:09

Dmytro Plekhotkin


People also ask

What does locale sensitive mean?

A Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user.

What is string ]() in Swift?

A string is a series of characters, such as "hello, world" or "albatross" . Swift strings are represented by the String type. The contents of a String can be accessed in various ways, including as a collection of Character values.

Is string mutable in Swift?

Strings in Swift (and C++) are value types. 2) Strings in Java and C# are always immutable even if the reference is not declared final or readonly. Strings in Swift (and C++) can be mutable or immutable depending on how they are declared (let vs var in Swift).

How do I count the number of characters in a string in Swift?

Swift – String Length/Count To get the length of a String in Swift, use count property of the string. count property is an integer value representing the number of characters in this string.


2 Answers

(All code examples updated for Swift 3 now.)

Comparing Swift strings with < does a lexicographical comparison based on the so-called "Unicode Normalization Form D" (which can be computed with decomposedStringWithCanonicalMapping)

For example, the decomposition of

"ä" = U+00E4 = LATIN SMALL LETTER A WITH DIAERESIS

is the sequence of two Unicode code points

U+0061,U+0308 = LATIN SMALL LETTER A + COMBINING DIAERESIS

For demonstration purposes, I have written a small String extension which dumps the contents of the String as an array of Unicode code points:

extension String {
    var unicodeData : String {
        return self.unicodeScalars.map {
            String(format: "%04X", $0.value)
            }.joined(separator: ",")
    }
}

Now lets take some strings, sort them with <:

let someStrings = ["ǟψ", "äψ", "ǟx", "äx"].sorted()
print(someStrings)
// ["a", "ã", "ă", "ä", "ǟ", "b"]

and dump the Unicode code points of each string (in original and decomposed form) in the sorted array:

for str in someStrings {
    print("\(str)  \(str.unicodeData)  \(str.decomposedStringWithCanonicalMapping.unicodeData)")
}

The output

äx  00E4,0078  0061,0308,0078
ǟx  01DF,0078  0061,0308,0304,0078
ǟψ  01DF,03C8  0061,0308,0304,03C8
äψ  00E4,03C8  0061,0308,03C8

nicely shows that the comparison is done by a lexicographic ordering of the Unicode code points in the decomposed form.

This is also true for strings of more than one character, as the following example shows. With

let someStrings = ["ǟψ", "äψ", "ǟx", "äx"].sorted()

the output of above loop is

äx  00E4,0078  0061,0308,0078
ǟx  01DF,0078  0061,0308,0304,0078
ǟψ  01DF,03C8  0061,0308,0304,03C8
äψ  00E4,03C8  0061,0308,03C8

which means that

"äx" < "ǟx", but "äψ" > "ǟψ"

(which was at least unexpected for me).

Finally let's compare this with a locale-sensitive ordering, for example swedish:

let locale = Locale(identifier: "sv") // svenska
var someStrings = ["ǟ", "ä", "ã", "a", "ă", "b"]
someStrings.sort {
    $0.compare($1, locale: locale) == .orderedAscending
}

print(someStrings)
// ["a", "ă", "ã", "b", "ä", "ǟ"]

As you see, the result is different from the Swift < sorting.

like image 142
Martin R Avatar answered Nov 14 '22 02:11

Martin R


Changing the locale can change the alphabetical order, e.g. a case-sensitive comparison can appear case-insensitive because of the locale, or more generally, the alphabetical order of two strings is different.

like image 42
Miro Lehtonen Avatar answered Nov 14 '22 03:11

Miro Lehtonen