Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - Match any sequence of characters except a particular word in a URL

Tags:

c#

regex

I want to match a URL that contains any sequence of valid URL characters but not a particular word. The URL in question http://gateway.ovid.com and I want to match anything but the word 'gateway' so:

  • http://abc123.ovid.com - would match
  • http://abc.123.ovid.com - would match
  • http://abc-123.ovid.com - would match
  • http://fdfsffdfs.ovid.com - would match

but

  • http://gateway.ovid.com - would NOT match

Something like the following:

^http://([a-z0-9\-\.]+|(?<!gateway))\.ovid\.com$

but it doesn't seem to work.


Update: Sorry forget to mention the language, it's C#.NET

like image 388
Sunday Ironfoot Avatar asked Aug 13 '10 17:08

Sunday Ironfoot


People also ask

How do you match everything except a word in regex?

How do you ignore something in regex? To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself.

What does ?= * Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What does \b mean in regex?

The \b metacharacter matches at the beginning or end of a word.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .


1 Answers

Your regex is almost correct except the extra '|' after '+'. Remove the '|'

^http://([a-z0-9\-\.]+(?<!gateway))\.ovid\.com$
like image 147
Gopi Avatar answered Oct 13 '22 19:10

Gopi