Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String comparison in Swift is not transitive

Tags:

swift

I ran into this example where s1 < s2 and s2 < s3 but (s1 < s3) is false:

var str1 = "あいかぎ"
var str2 = "あいかくしつ"
var str3 = "あいがみ:"

print(str1 < str2)       // True
print(str2 < str3)       // True
print(str1 < str3)       // False (?)

Is this a bug or it is true that we cannot rely on string comparison is transitive (this breaks my sorting of string array)? I'm running Swift 3.

Update: all of these are False

print(str1 < str3)       // False (?)
print(str1 == str3)       // False (?)
print(str1 > str3)       // False (?)

So some strings are not comparable with each other?

Update: a comment in How does the Swift string more than operator work pointed out that the source code for < operator is in https://github.com/apple/swift/blob/master/stdlib/public/core/String.swift, and the comparison is handled by _swift_stdlib_unicode_compare_utf8_utf8 in https://github.com/apple/swift/blob/master/stdlib/public/stubs/UnicodeNormalization.cpp

Update: These are true

print(str1 >= str3)  // True
print(str1 <= str3)  // True

Update: there is an issue with String.localizedCompare() too. There are two strings where s1 = s2 but s2 > s1:

str1 = "bảo toàn"
str2 = "bảo tồn"

print(str1.localizedCompare(str2) == .orderedSame) // true
print(str2.localizedCompare(str1) == .orderedDescending) // true
like image 433
Pinch Avatar asked Sep 15 '17 01:09

Pinch


People also ask

How to compare 2 strings in Swift?

In Swift, you can check for string and character equality with the "equal to" operator ( == ) and "not equal to" operator ( != ).

Do two strings contain the same characters Swift?

To check if two strings are equal in Swift you can use the == operator. Two string values are considered equal if they are canonically equivalent. For most cases, that means if they look the same, they are equal. There are a few exceptions but they are quite rare.


1 Answers

It looks like this is not supposed to happen:

Q: Is transitive consistency maintained by the [Unicode Collation Algorithm]?

A: Yes, for any strings A, B, and C, if A < B and B < C, then A < C. However, implementers must be careful to produce implementations that accurately reproduce the results of the Unicode Collation Algorithm as they optimize their own algorithms. It is easy to perform careless optimizations — especially with Incremental Comparison algorithms — that fail this test. Other items to check are the proper distinction between the bases of accents. For example, the sequence <u-macron, u-diaeresis-macron> should compare as less than <u-macron-diaeresis, u-macron>; this is a secondary distinction, based on the weighting of the accents, which must be correctly associated with the primary weights of their respective base letters.

(Source: Unicode Collation FAQ)

In the UnicodeNormalization.cpp file, ucol_strcoll and ucol_strcollIter are called, which are part of the ICU project. This may be a bug in the Swift standard library or the ICU project. I reported this issue to the Swift Bug Tracker.

like image 169
Palle Avatar answered Sep 26 '22 11:09

Palle