Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reliable function to get position of substring in string in Swift

This is working well for English:

public static func posOf(needle: String, haystack: String) -> Int {
    return haystack.distance(from: haystack.startIndex, to: (haystack.range(of: needle)?.lowerBound)!)
}

But for foreign characters the returned value is always too small. For example "का" is considered one unit instead of 2.

posOf(needle: "काम", haystack: "वह बीना की खुली कोयला खदान में काम करता था।") // 21

I later use the 21 in NSRange(location:length:) where it needs to be 28 to make NSRange work properly.

like image 418
twharmon Avatar asked Dec 22 '16 11:12

twharmon


1 Answers

A Swift String is a collection of Characters, and each Character represents an "extended Unicode grapheme cluster".

NSString is a collection of UTF-16 code units.

Example:

print("का".characters.count) // 1
print(("का" as NSString).length) // 2

Swift String ranges are represented as Range<String.Index>, and NSString ranges are represented as NSRange.

Your function counts the number of Characters from the start of the haystack to the start of the needle, and that is different from the number of UTF-16 code points.

If you need a "NSRange compatible" character count then the easiest method would be use the range(of:) method of NSString:

let haystack = "वह बीना की खुली कोयला खदान में काम करता था।"
let needle = "काम"

if let range = haystack.range(of: needle) {
    let pos = haystack.distance(from: haystack.startIndex, to: range.lowerBound)
    print(pos) // 21
}

let nsRange = (haystack as NSString).range(of: needle)
if nsRange.location != NSNotFound {
    print(nsRange.location) // 31
}

Alternatively, use the utf16 view of the Swift string to count UTF-16 code units:

if let range = haystack.range(of: needle) {
    let lower16 = range.lowerBound.samePosition(in: haystack.utf16)
    let pos = haystack.utf16.distance(from: haystack.utf16.startIndex, to: lower16)
    print(pos) // 31
}

(See for example NSRange to Range<String.Index> for more methods to convert between Range<String.Index> and NSRange).

like image 113
Martin R Avatar answered Nov 15 '22 03:11

Martin R