Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write regular expressions in Objective C (NSRegularExpression)?

I have this regex working when I test it in PHP but it doesn't work in Objective C:

(?:www\.)?((?!-)[a-zA-Z0-9-]{2,63}(?<!-))\.?((?:[a-zA-Z0-9]{2,})?(?:\.[a-zA-Z0-9]{2,})?) 

I tried escaping the escape characters but that doesn't help either. Should I escape any other character?

This is my code in Objective C:

NSMutableString *searchedString = [NSMutableString stringWithString:@"domain-name.tld.tld2"]; NSError* error = nil;  NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:@"(?:www\\.)?((?!-)[a-zA-Z0-9-]{2,63}(?<!-))\\.?((?:[a-zA-Z0-9]{2,})?(?:\\.[a-zA-Z0-9]{2,})?)" options:0 error:&error]; NSArray* matches = [regex matchesInString:searchedString options:0 range:NSMakeRange(0, [searchedString length])]; for ( NSTextCheckingResult* match in matches ) {     NSString* matchText = [searchedString substringWithRange:[match range]];     NSLog(@"match: %@", matchText); } 

-- UPDATE --

This regex returns (in PHP) the array with values "domain-name" and "tld.tld2" but in Objective C i get only one value: "domain-name.tld.tld2"

-- UPDATE 2 --

This regex extracts 'domain name' and 'TLD' from the string:

  • domain.com = (domain, com)
  • domain.co.uk = (domain, co.uk)
  • -test-domain.co.u = (test-domain, co)
  • -test-domain.co.uk- = (test-domain, co.uk)
  • -test-domain.co.u-k = (test-domain, co)
  • -test-domain.co-m = (test-domain)
  • -test-domain-.co.uk = (test-domain)

it takes the valid domain name (not starting or ending with '-' and between 2 and 63 characters long), and up to two parts of a TLD if the parts are valid (at least two characters long containing only letters and numbers)

Hope this explanation helps.

like image 936
budiDino Avatar asked Feb 14 '12 11:02

budiDino


People also ask

How do you create a regular expression object?

There are two ways to create a RegExp object: a literal notation and a constructor. The literal notation takes a pattern between two slashes, followed by optional flags, after the second slash.

What is () in regular expression?

The () will allow you to read exactly which characters were matched. Parenthesis are also useful for OR'ing two expressions with the bar | character. For example, (a-z|0-9) will match one character -- any of the lowercase alpha or digit.

How do you specify a regular expression?

A regex (regular expression) consists of a sequence of sub-expressions. In this example, [0-9] and + . The [...] , known as character class (or bracket list), encloses a list of characters. It matches any SINGLE character in the list.


Video Answer


2 Answers

A NSTextCheckingResult has multiple items obtained by indexing into it.

[match rangeAtIndex:0]; is the full match.
[match rangeAtIndex:1]; (if it exists) is the first capture group match.
etc.

You can use something like this:

NSString *searchedString = @"domain-name.tld.tld2"; NSRange   searchedRange = NSMakeRange(0, [searchedString length]); NSString *pattern = @"(?:www\\.)?((?!-)[a-zA-Z0-9-]{2,63}(?<!-))\\.?((?:[a-zA-Z0-9]{2,})?(?:\\.[a-zA-Z0-9]{2,})?)"; NSError  *error = nil;  NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern: pattern options:0 error:&error]; NSArray* matches = [regex matchesInString:searchedString options:0 range: searchedRange]; for (NSTextCheckingResult* match in matches) {     NSString* matchText = [searchedString substringWithRange:[match range]];     NSLog(@"match: %@", matchText);     NSRange group1 = [match rangeAtIndex:1];     NSRange group2 = [match rangeAtIndex:2];     NSLog(@"group1: %@", [searchedString substringWithRange:group1]);     NSLog(@"group2: %@", [searchedString substringWithRange:group2]); } 

NSLog output:

match: domain-name.tld.tld2
domain-name
tld.tld2

Do test that the match ranges are valid.

More simply in this case:

NSString *searchedString = @"domain-name.tld.tld2"; NSRange   searchedRange = NSMakeRange(0, [searchedString length]); NSString *pattern = @"(?:www\\.)?((?!-)[a-zA-Z0-9-]{2,63}(?<!-))\\.?((?:[a-zA-Z0-9]{2,})?(?:\\.[a-zA-Z0-9]{2,})?)"; NSError  *error = nil;  NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error]; NSTextCheckingResult *match = [regex firstMatchInString:searchedString options:0 range: searchedRange]; NSLog(@"group1: %@", [searchedString substringWithRange:[match rangeAtIndex:1]]); NSLog(@"group2: %@", [searchedString substringWithRange:[match rangeAtIndex:2]]); 

Swift 3.0:

let searchedString = "domain-name.tld.tld2" let nsSearchedString = searchedString as NSString let searchedRange = NSMakeRange(0, searchedString.characters.count) let pattern = "(?:www\\.)?((?!-)[a-zA-Z0-9-]{2,63}(?<!-))\\.?((?:[a-zA-Z0-9]{2,})?(?:\\.[a-zA-Z0-9]{2,})?)"  do {     let regex = try NSRegularExpression(pattern:pattern, options: [])     let matches = regex.matches(in:searchedString, options:[], range:searchedRange)     for match in matches {         let matchText = nsSearchedString.substring(with:match.range);         print("match: \(matchText)");          let group1 : NSRange = match.rangeAt(1)         let matchText1 = nsSearchedString.substring(with: group1)         print("matchText1: \(matchText1)")          let group2 = match.rangeAt(2)         let matchText2 = nsSearchedString.substring(with: group2)         print("matchText2: \(matchText2)")     } } catch let error as NSError {     print(error.localizedDescription) } 

print output:

match: domain-name.tld.tld2
matchText1: domain-name
matchText2: tld.tld2

More simply in this case:

do {     let regex = try NSRegularExpression(pattern:pattern, options: [])     let match = regex.firstMatch(in:searchedString, options:[], range:searchedRange)      let matchText1 = nsSearchedString.substring(with: match!.rangeAt(1))     print("matchText1: \(matchText1)")      let matchText2 = nsSearchedString.substring(with: match!.rangeAt(2))     print("matchText2: \(matchText2)")  } catch let error as NSError {     print(error.localizedDescription) } 

print output:

matchText1: domain-name
matchText2: tld.tld2

like image 66
zaph Avatar answered Oct 08 '22 18:10

zaph


According to Apple's documentation, these characters must be quoted (using \) to be treated as literals:

* ? + [ ( ) { } ^ $ | \ . / 

It would also help if you could explain what you are trying to achieve. Do you have any test fixtures?

like image 42
hwaxxer Avatar answered Oct 08 '22 20:10

hwaxxer