Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Swift countElements() return incorrect value when count flag emoji

let str1 = "πŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺ"
let str2 = "πŸ‡©πŸ‡ͺ.πŸ‡©πŸ‡ͺ.πŸ‡©πŸ‡ͺ.πŸ‡©πŸ‡ͺ.πŸ‡©πŸ‡ͺ."

println("\(countElements(str1)), \(countElements(str2))")

Result: 1, 10

But should not str1 have 5 elements?

The bug seems only occurred when I use the flag emoji.

like image 346
ZYiOS Avatar asked Nov 11 '14 10:11

ZYiOS


2 Answers

Update for Swift 4 (Xcode 9)

As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9 standard:

let str1 = "πŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺ"
print(str1.count) // 5
print(Array(str1)) // ["πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ", "πŸ‡©πŸ‡ͺ"]

Also String is a collection of its characters (again), so one can obtain the character count with str1.count.


(Old answer for Swift 3 and older:)

From "3 Grapheme Cluster Boundaries" in the "Standard Annex #29 UNICODE TEXT SEGMENTATION": (emphasis added):

A legacy grapheme cluster is defined as a base (such as A or γ‚«) followed by zero or more continuing characters. One way to think of this is as a sequence of characters that form a β€œstack”.

The base can be single characters, or be any sequence of Hangul Jamo characters that form a Hangul Syllable, as defined by D133 in The Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji national flag symbols corresponding to ISO country codes. Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.

(Thanks to @rintaro for the link).

A Swift Character represents an extended grapheme cluster, so it is (according to this reference) correct that any sequence of regional indicator symbols is counted as a single character.

You can separate the "flags" by a ZERO WIDTH NON-JOINER:

let str1 = "πŸ‡©πŸ‡ͺ\u{200C}πŸ‡©πŸ‡ͺ"
print(str1.characters.count) // 2

or insert a ZERO WIDTH SPACE:

let str2 = "πŸ‡©πŸ‡ͺ\u{200B}πŸ‡©πŸ‡ͺ"
print(str2.characters.count) // 3

This solves also possible ambiguities, e.g. should "πŸ‡«β€‹πŸ‡·β€‹πŸ‡Ίβ€‹πŸ‡Έ" be "πŸ‡«β€‹πŸ‡·πŸ‡Ίβ€‹πŸ‡Έ" or "πŸ‡«πŸ‡·β€‹πŸ‡ΊπŸ‡Έ" ?

See also How to know if two emojis will be displayed as one emoji? about a possible method to count the number of "composed characters" in a Swift string, which would return 5 for your let str1 = "πŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺ".

like image 98
Martin R Avatar answered Oct 19 '22 19:10

Martin R


Here's how I solved that problem, for Swift 3:

let str = "πŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺπŸ‡©πŸ‡ͺ" //or whatever the string of emojis is
let range = str.startIndex..<str.endIndex
var length = 0
str.enumerateSubstrings(in: range, options: NSString.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, stop) -> () in
        length = length + 1
    }
print("Character Count: \(length)")

This fixes all the problems with character count and emojis, and is the simplest method I have found.

like image 4
James Arnold Avatar answered Oct 19 '22 20:10

James Arnold