Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use NSRegularExpression on Swift strings with variable-width Unicode characters?

I'm having trouble getting NSRegularExpression to match patterns on strings with wider (?) Unicode characters in them. It looks like the problem is the range parameter -- Swift counts individual Unicode characters, while Objective-C treats strings as if they're made up of UTF-16 code units.

Here is my test string and two regular expressions:

let str = "dog🐶🐮cow"
let dogRegex = NSRegularExpression(pattern: "d.g", options: nil, error: nil)!
let cowRegex = NSRegularExpression(pattern: "c.w", options: nil, error: nil)!

I can match the first regex with no problems:

let dogMatch = dogRegex.firstMatchInString(str, options: nil, 
                   range: NSRange(location: 0, length: countElements(str)))
println(dogMatch?.range)  // (0, 3)

But the second fails with the same parameters, because the range I send it (0...7) isn't long enough to cover the whole string as far as NSRegularExpression is concerned:

let cowMatch = cowRegex.firstMatchInString(str, options: nil, 
                   range: NSRange(location: 0, length: countElements(str)))
println(cowMatch.range)  // nil

If I use a different range I can make the match succeed:

let cowMatch2 = cowRegex.firstMatchInString(str, options: nil, 
                    range: NSRange(location: 0, length: str.utf16Count))
println(cowMatch2?.range)  // (7, 3)

but then I don't know how to extract the matched text out of the string, since that range falls outside the range of the Swift string.

like image 377
Nate Cook Avatar asked Sep 17 '14 04:09

Nate Cook


1 Answers

Turns out you can fight fire with fire. Using the Swift-native string's utf16Count property and the substringWithRange: method of NSString -- not String -- gets the right result. Here's the full working code:

let str = "dog🐶🐮cow"
let cowRegex = NSRegularExpression(pattern: "c.w", options: nil, error: nil)!

if let cowMatch = cowRegex.firstMatchInString(str, options: nil,
                      range: NSRange(location: 0, length: str.utf16Count)) {
    println((str as NSString).substringWithRange(cowMatch.range))
    // prints "cow"
}

(I figured this out in the process of writing the question; score one for rubber duck debugging.)

like image 111
Nate Cook Avatar answered Nov 05 '22 00:11

Nate Cook