Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Regular Expression excluding a string

Tags:

c#

regex

I got a collection of string and all i want for regex is to collect all started with http..

href="http://www.test.com/cat/1-one_piece_episodes/"href="http://www.test.com/cat/2-movies_english_subbed/"href="http://www.test.com/cat/3-english_dubbed/"href="http://www.exclude.com"

this is my regular expression pattern..

href="(.*?)[^#]"

and return this

href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"

what is the pattern for excluding the last match.. or excluding matches that has the exclude domain inside like href="http://www.exclude.com"

EDIT: for multiple exclusion

href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"
like image 442
Vincent Dagpin Avatar asked Aug 05 '11 12:08

Vincent Dagpin


2 Answers

@ridgerunner and me would change the regex to:

href="((?:(?!\bexclude\b)[^"])*)[^#]"

It matches all href attributes as long as they don't end in # and don't contain the word exclude.

Explanation:

href="     # Match href="
(          # Capture...
 (?:       # the following group:
  (?!      # Look ahead to check that the next part of the string isn't...
   \b      # the entire word
   exclude # exclude
   \b      # (\b are word boundary anchors)
  )        # End of lookahead
  [^"]     # If successful, match any character except for a quote
 )*        # Repeat as often as possible
)          # End of capturing group 1
[^#]"      # Match a non-# character and the closing quote.

To allow multiple "forbidden words":

href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"
like image 61
Tim Pietzcker Avatar answered Nov 14 '22 04:11

Tim Pietzcker


Your input doesn't look like a valid string (unless you escape the quotes in them) but you can do it without regex too:

string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";

List<string> matches = new List<string>();

foreach(var match in input.split(new string[]{"href"})) {
   if(!match.Contains("exclude.com"))
      matches.Add("href" + match);
}
like image 2
Mrchief Avatar answered Nov 14 '22 06:11

Mrchief