Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the Range of the Nth word in a String

What I want is something like

"word1 word2 word3".rangeOfWord(2) => 6 to 10

The result could come as a Range or a tuple or whatever.

I'd rather not do the brute force of iterating over the characters and using a state machine. Why reinvent the lexer? Is there a better way?

like image 894
Andrew Duncan Avatar asked Mar 13 '23 19:03

Andrew Duncan


2 Answers

In your example, your words are unique, and you can use the following method:

let myString = "word1 word2 word3"
let wordNum = 2
let myRange = myString.rangeOfString(myString.componentsSeparatedByString(" ")[wordNum-1])
    // 6..<11

As pointed out by Andrew Duncan in the comments below, the above is only valid if your words are unique. If you have non-unique words, you can use this somewhat less neater method:

let myString = "word1 word2 word3 word2 word1 word3 word1"
let wordNum = 7 // 2nd instance (out of 3) of "word1"
let arr = myString.componentsSeparatedByString(" ")
var fromIndex = arr[0..<wordNum-1].map { $0.characters.count }.reduce(0, combine: +) + wordNum - 1

let myRange = Range<String.Index>(start: myString.startIndex.advancedBy(fromIndex), end: myString.startIndex.advancedBy(fromIndex+arr[wordNum-1].characters.count))
let myWord = myString.substringWithRange(myRange) 
    // string "word1" (from range 36..<41)

Finally, lets use the latter to construct an extension of String as you have wished for in your question example:

extension String {
    private func rangeOfNthWord(wordNum: Int, wordSeparator: String) -> Range<String.Index>? {
        let arr = myString.componentsSeparatedByString(wordSeparator)

        if arr.count < wordNum {
            return nil
        }
        else {
            let fromIndex = arr[0..<wordNum-1].map { $0.characters.count }.reduce(0, combine: +) + (wordNum - 1)*wordSeparator.characters.count
            return Range<String.Index>(start: myString.startIndex.advancedBy(fromIndex), end: myString.startIndex.advancedBy(fromIndex+arr[wordNum-1].characters.count))
        }
    }
}

let myString = "word1 word2 word3 word2 word1 word3 word1"
let wordNum = 7 // 2nd instance (out of 3) of "word1"

if let myRange = myString.rangeOfNthWord(wordNum, wordSeparator: " ") {
        // myRange: 36..<41
    print(myString.substringWithRange(myRange)) // prints "word1"
}

You can tweak the .rangeOfNthWord(...) method if word separation is not unique (say some words are separated by two blankspaces " ").


Also pointed out in the comments below, the use of .rangeOfString(...) is not, per se, pure Swift. It is, however, by no means bad practice. From Swift Language Guide - Strings and Characters:

Swift’s String type is bridged with Foundation’s NSString class. If you are working with the Foundation framework in Cocoa, the entire NSString API is available to call on any String value you create when type cast to NSString, as described in AnyObject. You can also use a String value with any API that requires an NSString instance.

See also the NSString class reference for rangeOfString method:

// Swift Declaration:
func rangeOfString(_ searchString: String) -> NSRange
like image 157
dfrib Avatar answered Mar 17 '23 04:03

dfrib


I went ahead and wrote the state machine. (Grumble..) FWIW, here it is:

extension String {
    private func halfOpenIntervalOfBlock(n:Int, separator sep:Character? = nil) -> (Int, Int)? {
        enum State {
            case InSeparator
            case InPrecedingSeparator
            case InWord
            case InTarget
            case Done
        }

        guard n > 0 else {
            return nil
        }

        var state:State
        if n == 1 {
            state = .InPrecedingSeparator
        } else {
            state = .InSeparator
        }

        var separatorNum = 0
        var startIndex:Int = 0
        var endIndex:Int = 0

        for (i, c) in self.characters.enumerate() {
            let inSeparator:Bool
            // A bit inefficient to keep doing this test.
            if let s = sep {
                inSeparator = c == s
            } else {
                inSeparator = c == " " || c == "\n"
            }
            endIndex = i

            switch state {
            case .InPrecedingSeparator:
                if !inSeparator {
                    state = .InTarget
                    startIndex = i
                }

            case .InTarget:
                if inSeparator {
                    state = .Done
                }

            case .InWord:
                if inSeparator {
                    separatorNum += 1
                    if separatorNum == n - 1 {
                        state = .InPrecedingSeparator
                    } else {
                        state = .InSeparator
                    }
                }

            case .InSeparator:
                if !inSeparator {
                    state = .InWord
                }

            case .Done:
                break
            }

            if state == .Done {
                break
            }
        }

        if state == .Done {
            return (startIndex, endIndex)
        } else if state == .InTarget {
            return (startIndex, endIndex + 1) // We ran off end.
        } else {
            return nil
        }
    }

    func rangeOfWord(n:Int) -> Range<Index>? {
        guard let (s, e) = self.halfOpenIntervalOfBlock(n) else {
            return nil
        }
        let ss = self.startIndex.advancedBy(s)
        let ee = self.startIndex.advancedBy(e)
        return Range(start:ss, end:ee)
    }

 }
like image 25
Andrew Duncan Avatar answered Mar 17 '23 04:03

Andrew Duncan