Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excluding regex matches that are preceded by a certain character

I have the following:

Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\#\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);

This matches all URLs, but I'd like to exclude those that are preceded by the characters " or '. I've been trying to achieve this using other solutions (Regex to exclude [ unless preceded by \) but haven't been able to get it to pass.

If I have this, I should get a match:

The brown fox www.google.com

However, if I have this:

The brown fox <a href="www.google.com">boo</a>

I should not get a match, because of the ". How can this be achieved?

like image 206
SB2055 Avatar asked Feb 06 '23 11:02

SB2055


1 Answers

You need a negative lookbehind: Prefix your regular expression by (?<!["']).

Explanation:

  • (?<!...) means: The stuff directly preceding the current position must not match ....
  • ["'] is simply a character group containing the two characters you want to exclude.

Note: Inside @"..." strings, double qoutes are escaped by doubling them, so your code will read:

Regex urlRx = new Regex(@"(?<![""'])((https?|ftp|file)...

In VB:

Dim urlRx As New Regex("(?<![""'])((https?|ftp|file)...
like image 150
Heinzi Avatar answered Feb 08 '23 16:02

Heinzi