I ran into this example where s1 < s2 and s2 < s3 but (s1 < s3) is false:
var str1 = "あいかぎ"
var str2 = "あいかくしつ"
var str3 = "あいがみ:"
print(str1 < str2) // True
print(str2 < str3) // True
print(str1 < str3) // False (?)
Is this a bug or it is true that we cannot rely on string comparison is transitive (this breaks my sorting of string array)? I'm running Swift 3.
Update: all of these are False
print(str1 < str3) // False (?)
print(str1 == str3) // False (?)
print(str1 > str3) // False (?)
So some strings are not comparable with each other?
Update: a comment in How does the Swift string more than operator work pointed out that the source code for < operator is in https://github.com/apple/swift/blob/master/stdlib/public/core/String.swift, and the comparison is handled by _swift_stdlib_unicode_compare_utf8_utf8
in https://github.com/apple/swift/blob/master/stdlib/public/stubs/UnicodeNormalization.cpp
Update: These are true
print(str1 >= str3) // True
print(str1 <= str3) // True
Update: there is an issue with String.localizedCompare()
too. There are two strings where s1 = s2 but s2 > s1:
str1 = "bảo toàn"
str2 = "bảo tồn"
print(str1.localizedCompare(str2) == .orderedSame) // true
print(str2.localizedCompare(str1) == .orderedDescending) // true
In Swift, you can check for string and character equality with the "equal to" operator ( == ) and "not equal to" operator ( != ).
To check if two strings are equal in Swift you can use the == operator. Two string values are considered equal if they are canonically equivalent. For most cases, that means if they look the same, they are equal. There are a few exceptions but they are quite rare.
It looks like this is not supposed to happen:
Q: Is transitive consistency maintained by the [Unicode Collation Algorithm]?
A: Yes, for any strings A, B, and C, if A < B and B < C, then A < C. However, implementers must be careful to produce implementations that accurately reproduce the results of the Unicode Collation Algorithm as they optimize their own algorithms. It is easy to perform careless optimizations — especially with Incremental Comparison algorithms — that fail this test. Other items to check are the proper distinction between the bases of accents. For example, the sequence <u-macron, u-diaeresis-macron> should compare as less than <u-macron-diaeresis, u-macron>; this is a secondary distinction, based on the weighting of the accents, which must be correctly associated with the primary weights of their respective base letters.
(Source: Unicode Collation FAQ)
In the UnicodeNormalization.cpp
file, ucol_strcoll
and ucol_strcollIter
are called, which are part of the ICU project. This may be a bug in the Swift standard library or the ICU project.
I reported this issue to the Swift Bug Tracker.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With