Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .
Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn't need to have a name, make it non-capturing using the (?:group) syntax. In . NET you can make all unnamed groups non-capturing by setting RegexOptions.
Capturing groups are a handy feature of regular expression matching that allows us to query the Match object to find out the part of the string that matched against a particular part of the regular expression. Anything you have in parentheses () will be a capture group.
Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.
You will access the first group range using :
for (NSTextCheckingResult *match in matches) {
//NSRange matchRange = [match range];
NSRange matchRange = [match rangeAtIndex:1];
NSString *matchString = [htmlString substringWithRange:matchRange];
NSLog(@"%@", matchString);
}
Don't parse HTML with regular expressions or NSScanner. Down that path lies madness.
This has been asked many times on SO.
parsing HTML on the iPhone
The data i am picking out is as simple as
<td>Name: A name</td>
and i think its simple enough to just use regular expressions instead of including a full blown HTML parser in the project.
Up to you and I'm a strong advocate for "first to market has huge advantage".
The difference being that with a proper HTML parser, you are considering the structure of the document. Using regular expressions, you are relying on the document never changing format in ways that are syntactically otherwise perfectly valid.
I.e. what if the input were <td class="name">Name: A name</td>
? Your regex parser just broke on input that is both valid HTML and, from a tag contents perspective, identical to the original input.
In swift3
//: Playground - noun: a place where people can play
import UIKit
/// Two groups. 1: [A-Z]+, 2: [0-9]+
var pattern = "([A-Z]+)([0-9]+)"
let regex = try NSRegularExpression(pattern: pattern, options:[.caseInsensitive])
let str = "AA01B2C3DD4"
let strLen = str.characters.count
let results = regex.matches(in: str, options: [], range: NSMakeRange(0, strLen))
let nsStr = str as NSString
for a in results {
let c = a.numberOfRanges
print(c)
let m0 = a.rangeAt(0) //< Ex: 'AA01'
let m1 = a.rangeAt(1) //< Group 1: Alpha chars, ex: 'AA'
let m2 = a.rangeAt(2) //< Group 2: Digital numbers, ex: '01'
// let m3 = a.rangeAt(3) //< Runtime exceptions
let s = nsStr.substring(with: m2)
print(s)
}
HTML isn't a regular language and can't be properly parsed using regular expressions. Here's a classic SO answer explaining this common programmer misassumption.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With