Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression: matching only if not ending in particular sequence

Tags:

regex

I would like to test a url that does NOT end in .html

This is the pattern I come up with:

[/\w\.-]+[^\.html$]

The following matches because it does not end in .html

/blog/category/subcategory/

This doesn't match because it ends in .html:

/blog/category/subcategory/index.html

However, the following does not match, although I want it to match, because it ends in .ht and not .html

/blog/category/subcategory/index.ht

How should I change my pattern?

like image 214
Kevin Le - Khnle Avatar asked Feb 11 '11 20:02

Kevin Le - Khnle


People also ask

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

Can we use if condition in regular expression?

If-Then-Else Conditionals in Regular Expressions. A special construct (? ifthen|else) allows you to create conditional regular expressions. If the if part evaluates to true, then the regex engine will attempt to match the then part.

What is the regular expression matching one or more specific characters?

The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus .


2 Answers

You can use a negative lookbehind assertion if your regular expression engine supports it:

^[/\w\.-]+(?<!\.html)$

If you don't have lookbehind assertions but you do have lookaheads then you can use that instead:

^(?!.*\.html$)[/\w\.-]+$

See it working online: rubular

like image 187
Mark Byers Avatar answered Dec 06 '22 16:12

Mark Byers


What engine are you using? If it's one that supports lookahead assertions, you can do the following:

/((?!\.html$)[/\w.-])+/

If we break it out into the components, it looks like this:

(            # start a group for the purposes of repeating
 (?!\.html$) # negative lookahead assertion for the pattern /\.html$/
 [/\w.-]     # your own pattern for matching a URL character
)+           # repeat the group

This means that, for every character, it tests that the pattern /.html$/ can't match here, before it consumes the character.

You may also want to anchor the entire pattern with ^ at the start and $ at the end to force it to match the entire URL - otherwise it's free to only match a portion of the URL. With this change, it becomes

/^((?!\.html$)[/\w.-])+$/
like image 28
Lily Ballard Avatar answered Dec 06 '22 16:12

Lily Ballard