I would like to test a url that does NOT end in .html This is the pattern I come up with: <pre class="prettyprint"><code>[/\w\.-]+[^\.html$] </code></pre> The following matches because it does not end in .html <pre class="prettyprint"><code>/blog/category/subcategory/ </code></pre> This doesn't match because it ends in .html: <pre class="prettyprint"><code>/blog/category/subcategory/index.html </code></pre> However, the following does not match, although I want it to match, because it ends in .ht and not .html <pre class="prettyprint"><code>/blog/category/subcategory/index.ht </code></pre> How should I change my pattern?

You can use a negative lookbehind assertion if your regular expression engine supports it: <pre class="prettyprint"><code>^[/\w\.-]+(?<!\.html)$ </code></pre> If you don't have lookbehind assertions but you do have lookaheads then you can use that instead: <pre class="prettyprint"><code>^(?!.*\.html$)[/\w\.-]+$ </code></pre> See it working online: rubular

What engine are you using? If it's one that supports lookahead assertions, you can do the following: <pre class="prettyprint"><code>/((?!\.html$)[/\w.-])+/ </code></pre> If we break it out into the components, it looks like this: <pre class="prettyprint"><code>( # start a group for the purposes of repeating (?!\.html$) # negative lookahead assertion for the pattern /\.html$/ [/\w.-] # your own pattern for matching a URL character )+ # repeat the group </code></pre> This means that, for every character, it tests that the pattern /.html$/ can't match here, before it consumes the character. You may also want to anchor the entire pattern with <code>^</code> at the start and <code>$</code> at the end to force it to match the entire URL - otherwise it's free to only match a portion of the URL. With this change, it becomes <pre class="prettyprint"><code>/^((?!\.html$)[/\w.-])+$/ </code></pre>

Regular expression: matching only if not ending in particular sequence

Tags:

regex

I would like to test a url that does NOT end in .html

This is the pattern I come up with:

[/\w\.-]+[^\.html$]

The following matches because it does not end in .html

/blog/category/subcategory/

This doesn't match because it ends in .html:

/blog/category/subcategory/index.html

However, the following does not match, although I want it to match, because it ends in .ht and not .html

/blog/category/subcategory/index.ht

How should I change my pattern?

214

asked Feb 11 '11 20:02

Kevin Le - Khnle

2 Answers

You can use a negative lookbehind assertion if your regular expression engine supports it:

^[/\w\.-]+(?<!\.html)$

If you don't have lookbehind assertions but you do have lookaheads then you can use that instead:

^(?!.*\.html$)[/\w\.-]+$

See it working online: rubular

187

answered Dec 06 '22 16:12

Mark Byers

What engine are you using? If it's one that supports lookahead assertions, you can do the following:

/((?!\.html$)[/\w.-])+/

If we break it out into the components, it looks like this:

(            # start a group for the purposes of repeating
 (?!\.html$) # negative lookahead assertion for the pattern /\.html$/
 [/\w.-]     # your own pattern for matching a URL character
)+           # repeat the group

This means that, for every character, it tests that the pattern /.html$/ can't match here, before it consumes the character.

You may also want to anchor the entire pattern with ^ at the start and $ at the end to force it to match the entire URL - otherwise it's free to only match a portion of the URL. With this change, it becomes

/^((?!\.html$)[/\w.-])+$/

answered Dec 06 '22 16:12

Lily Ballard

Related questions
                            
                                How can I match against multiple regexes in Perl?
                            
                                javascript string remove white spaces and hyphens
                            
                                UTF-8 in PHP regular expressions [duplicate]
                            
                                HTML5 input validate letters and numbers
                            
                                Javascript regex to match a string that ends with some characters but not with a particular combination of those
                            
                                Regular expression for not empty
                            
                                How to form a regex to recognize correct declaration of variable names [closed]
                            
                                How to group search regular expressions using swift
                            
                                R remove first character from string
                            
                                Regex to match four repeated letters in a string using a Java pattern
                            
                                preg match count matches
                            
                                Remove duplicate chars using regex?
                            
                                How can I check if my string contains a period in JavaScript?
                            
                                How to take only first line from the multiline text
                            
                                Validate an email inside an EditText [duplicate]
                            
                                How do I get the last segment of URL using regular expressions
                            
                                Javascript Regular Expression for rgb values
                            
                                How to parse OData $filter with regular expression in C#?
                            
                                Regex to detect if all alphabetic characters are upper case
                            
                                XSLT Replace function not found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With