I'm trying to do a simple regex match using NSRegularExpression, but I'm having some problems matching the string when the source contains multibyte characters:
let string = "D 9"
// The following matches (any characters)(SPACE)(numbers)(any characters)
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
let slen : Int = string.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)
var error: NSError? = nil
var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.DotMatchesLineSeparators, error: &error)
var result = regex?.stringByReplacingMatchesInString(string, options: nil, range: NSRange(location:0,
length:slen), withTemplate: "First \"$1\" Second: \"$2\"")
The code above returns "D" and "9" as expected
If I now change the first line to include a UK 'Pound' currency symbol as follows:
let string = "£ 9"
Then the match doesn't work, even though the ([\\s\\S]*)
part of the expression should still match any leading characters.
I understand that the £
symbol will take two bytes but the wildcard leading match should ignore those shouldn't it?
Can anyone explain what is going on here please?
This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
In regular expressions, we can match any character using period "." character. To match multiple characters or a given set of characters, we should use character classes.
Swift's regex syntax is compatible with Perl, Python, Ruby, Java, NSRegularExpression, and many, many others. This regex matches one or more digits. The compiler knows regex syntax, so you'll get syntax highlighting, compile-time errors, and even strongly-typed captures, which we'll be meeting later.
It can be confusing. The first parameter of stringByReplacingMatchesInString()
is mapped from NSString
in
Objective-C to String
in Swift, but the range:
parameter is still
an NSRange
. Therefore you have to specify the range in the units
used by NSString
(which is the number of UTF-16 code points):
var result = regex?.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
Alternatively you can use count(string.utf16)
instead of (string as NSString).length
.
Full example:
let string = "£ 9"
let pattern = "([\\s\\S]*) ([0-9]*)(.*)"
var error: NSError? = nil
let regex = NSRegularExpression(pattern: pattern,
options: NSRegularExpressionOptions.DotMatchesLineSeparators,
error: &error)!
let result = regex.stringByReplacingMatchesInString(string,
options: nil,
range: NSRange(location:0, length:(string as NSString).length),
withTemplate: "First \"$1\" Second: \"$2\"")
println(result)
// First "£" Second: "9"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With