Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Named capture groups in NSRegularExpression - get a range's group's name

Apple says that NSRegularExpression is based on the ICU Regular Expression library: https://developer.apple.com/library/ios/documentation/Foundation/Reference/NSRegularExpression_Class/

The pattern syntax currently supported is that specified by ICU. The ICU regular expressions are described at http://userguide.icu-project.org/strings/regexp.

That page (on icu-project.org) claims that Named Capture Groups are now supported, using the same syntax as .NET Regular Expressions:

(?<name>...) Named capture group. The <angle brackets> are literal - they appear in the pattern.

I have written a program which gets a single match with multiple ranges which seem correct - though each range is returned twice (for reasons unknown) - but the only information I have is the range's index and its text range.

For example, the regex: ^(?<foo>foo)\.(?<bar>bar)\.(?<bar2>baz)$ with test string foo.bar.baz

Gives me these results:

Idx    Start    Length     Text
0      0        11         foo.bar.baz
1      0         3         foo
2      4         3         bar
3      8         3         baz

Is there any way to know that "baz" came from the capture-group bar2?

like image 735
Dai Avatar asked Mar 07 '16 09:03

Dai


3 Answers

Since iOS11 named capture groups are supported. NSTextCheckingResult has the function open func range(withName name: String) -> NSRange.

Using the regex: ^(?<foo>foo)\.(?<bar>bar)\.(?<bar2>baz)$ with the test string foo.bar.baz gives 4 result matches. The function match.range(withName: "bar2") returns the range for the String baz

like image 94
jtmayer Avatar answered Nov 09 '22 22:11

jtmayer


I have worked on the example as created by Daniele Bernardini.

There are a number of changes:

  • First of all the code is now compatible with Swift 3
  • The code of Daniele has a defect that it will not capture nested captures. I have made the regular expressions slightly less aggressive to allow for nesting of capture groups.
  • I prefer to actually receive the actual captures in a Set. I added a method named captureGroups() that returns the captures as a string instead of a range.

    import Foundation
    
    extension String {
        func matchingStrings(regex: String) -> [[String]] {
            guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
            let nsString = self as NSString
            let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
            return results.map { result in
                (0..<result.numberOfRanges).map { result.rangeAt($0).location != NSNotFound
                    ? nsString.substring(with: result.rangeAt($0))
                    : ""
                }
            }
        }
    
        func range(from nsRange: NSRange) -> Range<String.Index>? {
            guard
                let from16 = utf16.index(utf16.startIndex, offsetBy: nsRange.location, limitedBy: utf16.endIndex),
                let to16 = utf16.index(utf16.startIndex, offsetBy: nsRange.location + nsRange.length, limitedBy: utf16.endIndex),
                let from = from16.samePosition(in: self),
                let to = to16.samePosition(in: self)
                else { return nil }
            return from ..< to
        }
    
    }
    
    extension NSRegularExpression {
        typealias GroupNamesSearchResult = (NSTextCheckingResult, NSTextCheckingResult, Int)
    
        private func textCheckingResultsOfNamedCaptureGroups() -> [String:GroupNamesSearchResult] {
            var groupnames = [String:GroupNamesSearchResult]()
    
            guard let greg = try? NSRegularExpression(pattern: "^\\(\\?<([\\w\\a_-]*)>$", options: NSRegularExpression.Options.dotMatchesLineSeparators) else {
                // This never happens but the alternative is to make this method throwing
                return groupnames
            }
            guard let reg = try? NSRegularExpression(pattern: "\\(.*?>", options: NSRegularExpression.Options.dotMatchesLineSeparators) else {
                // This never happens but the alternative is to make this method throwing
                return groupnames
            }
            let m = reg.matches(in: self.pattern, options: NSRegularExpression.MatchingOptions.withTransparentBounds, range: NSRange(location: 0, length: self.pattern.utf16.count))
            for (n,g) in m.enumerated() {
                let r = self.pattern.range(from: g.rangeAt(0))
                let gstring = self.pattern.substring(with: r!)
                let gmatch = greg.matches(in: gstring, options: NSRegularExpression.MatchingOptions.anchored, range: NSRange(location: 0, length: gstring.utf16.count))
                if gmatch.count > 0{
                    let r2 = gstring.range(from: gmatch[0].rangeAt(1))!
                    groupnames[gstring.substring(with: r2)] = (g, gmatch[0],n)
                }
    
            }
            return groupnames
        }
    
        func indexOfNamedCaptureGroups() throws -> [String:Int] {
            var groupnames = [String:Int]()
            for (name,(_,_,n)) in try self.textCheckingResultsOfNamedCaptureGroups() {
                groupnames[name] = n + 1
            }
            return groupnames
        }
    
        func rangesOfNamedCaptureGroups(match:NSTextCheckingResult) throws -> [String:Range<Int>] {
            var ranges = [String:Range<Int>]()
            for (name,(_,_,n)) in try self.textCheckingResultsOfNamedCaptureGroups() {
                ranges[name] = match.rangeAt(n+1).toRange()
            }
            return ranges
        }
    
        private func nameForIndex(_ index: Int, from: [String:GroupNamesSearchResult]) -> String? {
            for (name,(_,_,n)) in from {
                if (n + 1) == index {
                    return name
                }
            }
            return nil
        }
    
        func captureGroups(string: String, options: NSRegularExpression.MatchingOptions = []) -> [String:String] {
            return captureGroups(string: string, options: options, range: NSRange(location: 0, length: string.utf16.count))
        }
    
        func captureGroups(string: String, options: NSRegularExpression.MatchingOptions = [], range: NSRange) -> [String:String] {
            var dict = [String:String]()
            let matchResult = matches(in: string, options: options, range: range)
            let names = try self.textCheckingResultsOfNamedCaptureGroups()
            for (n,m) in matchResult.enumerated() {
                for i in (0..<m.numberOfRanges) {
                    let r2 = string.range(from: m.rangeAt(i))!
                    let g = string.substring(with: r2)
                    if let name = nameForIndex(i, from: names) {
                        dict[name] = g
                    }
                }
            }
            return dict
        }
    }
    

An example of using the new method captureGroups() is:

    let node = "'test_literal'"
    let regex = try NSRegularExpression(pattern: "^(?<all>(?<delimiter>'|\")(?<value>.*)(?:\\k<delimiter>))$", options: NSRegularExpression.Options.dotMatchesLineSeparators)
    let match2 = regex.captureGroups(string: node, options: NSRegularExpression.MatchingOptions.anchored)
    print(match2)

And it will print:

["delimiter": "\'", "all": "\'test_literal\'", "value": "test_literal"]

like image 42
Fred Appelman Avatar answered Nov 09 '22 23:11

Fred Appelman


I was facing the same issue and ended up backing my own solution. Feel free to comment or improve ;-)

extension NSRegularExpression {
    typealias GroupNamesSearchResult = (NSTextCheckingResult, NSTextCheckingResult, Int)

    private func textCheckingResultsOfNamedCaptureGroups() throws -> [String:GroupNamesSearchResult] {
        var groupnames = [String:GroupNamesSearchResult]()

        let greg = try NSRegularExpression(pattern: "^\\(\\?<([\\w\\a_-]*)>.*\\)$", options: NSRegularExpressionOptions.DotMatchesLineSeparators)
        let reg = try NSRegularExpression(pattern: "\\([^\\(\\)]*\\)", options: NSRegularExpressionOptions.DotMatchesLineSeparators)
        let m = reg.matchesInString(self.pattern, options: NSMatchingOptions.WithTransparentBounds, range: NSRange(location: 0, length: self.pattern.utf16.count))
        for (n,g) in m.enumerate() {
            let gstring = self.pattern.substringWithRange(g.rangeAtIndex(0).toRange()!)
            print(self.pattern.substringWithRange(g.rangeAtIndex(0).toRange()!))
            let gmatch = greg.matchesInString(gstring, options: NSMatchingOptions.Anchored, range: NSRange(location: 0, length: gstring.utf16.count))
            if gmatch.count > 0{
                groupnames[gstring.substringWithRange(gmatch[0].rangeAtIndex(1).toRange()!)] = (g,gmatch[0],n)
            }

        }
        return groupnames
    }
    func indexOfNamedCaptureGroups() throws -> [String:Int] {
        var groupnames = [String:Int]()
        for (name,(_,_,n)) in try self.textCheckingResultsOfNamedCaptureGroups() {
            groupnames[name] = n + 1
        }
        //print(groupnames)
        return groupnames
    }

    func rangesOfNamedCaptureGroups(match:NSTextCheckingResult) throws -> [String:Range<Int>] {
        var ranges = [String:Range<Int>]()
        for (name,(_,_,n)) in try self.textCheckingResultsOfNamedCaptureGroups() {
            ranges[name] = match.rangeAtIndex(n+1).toRange()
        }
        return ranges
    }
}

Here is an usage example:

let node = "'test_literal'"
let regex = try NSRegularExpression(pattern: "^(?<delimiter>'|\")(?<value>.*)(?:\\k<delimiter>)$", options: NSRegularExpressionOptions.DotMatchesLineSeparators)
let match = regex.matchesInString(node, options: NSMatchingOptions.Anchored, range: NSRange(location: 0,length: node.utf16.count))
if match.count > 0 {

    let ranges = try regex.rangesOfNamedCaptureGroups(match[0])
    guard let range = ranges["value"] else {

    }
}
like image 24
Daniele Bernardini Avatar answered Nov 09 '22 22:11

Daniele Bernardini