I have the following:
Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\#\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);
This matches all URLs, but I'd like to exclude those that are preceded by the characters "
or '
. I've been trying to achieve this using other solutions (Regex to exclude [ unless preceded by \) but haven't been able to get it to pass.
If I have this, I should get a match:
The brown fox www.google.com
However, if I have this:
The brown fox <a href="www.google.com">boo</a>
I should not get a match, because of the "
. How can this be achieved?
You need a negative lookbehind: Prefix your regular expression by (?<!["'])
.
Explanation:
(?<!...)
means: The stuff directly preceding the current position must not match ...
.["']
is simply a character group containing the two characters you want to exclude.Note: Inside @"..."
strings, double qoutes are escaped by doubling them, so your code will read:
Regex urlRx = new Regex(@"(?<![""'])((https?|ftp|file)...
In VB:
Dim urlRx As New Regex("(?<![""'])((https?|ftp|file)...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With