I started learning Swift language and I am very curious What does it mean that string and character comparisons in Swift are not locale-sensitive? Does it mean that all the characters are stored in Swift like UTF-8 characters?
A Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user.
A string is a series of characters, such as "hello, world" or "albatross" . Swift strings are represented by the String type. The contents of a String can be accessed in various ways, including as a collection of Character values.
Strings in Swift (and C++) are value types. 2) Strings in Java and C# are always immutable even if the reference is not declared final or readonly. Strings in Swift (and C++) can be mutable or immutable depending on how they are declared (let vs var in Swift).
Swift – String Length/Count To get the length of a String in Swift, use count property of the string. count property is an integer value representing the number of characters in this string.
(All code examples updated for Swift 3 now.)
Comparing Swift strings with <
does a lexicographical comparison
based on the so-called "Unicode Normalization Form D" (which can be computed with
decomposedStringWithCanonicalMapping
)
For example, the decomposition of
"ä" = U+00E4 = LATIN SMALL LETTER A WITH DIAERESIS
is the sequence of two Unicode code points
U+0061,U+0308 = LATIN SMALL LETTER A + COMBINING DIAERESIS
For demonstration purposes, I have written a small String extension which dumps the contents of the String as an array of Unicode code points:
extension String {
var unicodeData : String {
return self.unicodeScalars.map {
String(format: "%04X", $0.value)
}.joined(separator: ",")
}
}
Now lets take some strings, sort them with <
:
let someStrings = ["ǟψ", "äψ", "ǟx", "äx"].sorted()
print(someStrings)
// ["a", "ã", "ă", "ä", "ǟ", "b"]
and dump the Unicode code points of each string (in original and decomposed form) in the sorted array:
for str in someStrings {
print("\(str) \(str.unicodeData) \(str.decomposedStringWithCanonicalMapping.unicodeData)")
}
The output
äx 00E4,0078 0061,0308,0078
ǟx 01DF,0078 0061,0308,0304,0078
ǟψ 01DF,03C8 0061,0308,0304,03C8
äψ 00E4,03C8 0061,0308,03C8
nicely shows that the comparison is done by a lexicographic ordering of the Unicode code points in the decomposed form.
This is also true for strings of more than one character, as the following example shows. With
let someStrings = ["ǟψ", "äψ", "ǟx", "äx"].sorted()
the output of above loop is
äx 00E4,0078 0061,0308,0078
ǟx 01DF,0078 0061,0308,0304,0078
ǟψ 01DF,03C8 0061,0308,0304,03C8
äψ 00E4,03C8 0061,0308,03C8
which means that
"äx" < "ǟx", but "äψ" > "ǟψ"
(which was at least unexpected for me).
Finally let's compare this with a locale-sensitive ordering, for example swedish:
let locale = Locale(identifier: "sv") // svenska
var someStrings = ["ǟ", "ä", "ã", "a", "ă", "b"]
someStrings.sort {
$0.compare($1, locale: locale) == .orderedAscending
}
print(someStrings)
// ["a", "ă", "ã", "b", "ä", "ǟ"]
As you see, the result is different from the Swift <
sorting.
Changing the locale can change the alphabetical order, e.g. a case-sensitive comparison can appear case-insensitive because of the locale, or more generally, the alphabetical order of two strings is different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With