Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx - Exclude Matched Patterns

I have the below patterns to be excluded.

make it cheaper
make it cheapere
makeitcheaper.com.au
makeitcheaper
making it cheaper
www.make it cheaper
ww.make it cheaper.com

I've created a regex to match any of these. However, I want to get everything else other than these. I am not sure how to inverse this regex I've created.

mak(e|ing) ?it ?cheaper

Above pattern matches all the strings listed. Now I want it to match everything else. How do I do it?

From the search, it seems I need something like negative lookahead / look back. But, I don't really get it. Can some one point me in the right direction?

like image 558
San Avatar asked Aug 14 '13 20:08

San


People also ask

How do you ignore something in regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

How do you stop special characters in regex?

Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .


2 Answers

You can just put it in a negative look-ahead like so:

(?!mak(e|ing) ?it ?cheaper)

Just like that isn't going to work though since, if you do a matches1, it won't match since you're just looking ahead, you aren't actually matching anything, and, if you do a find1, it will match many times, since you can start from lots of places in the string where the next characters doesn't match the above.

To fix this, depending on what you wish to do, we have 2 choices:

  1. If you want to exclude all strings that are exactly one of those (i.e. "make it cheaperblahblah" is not excluded), check for start (^) and end ($) of string:

    ^(?!mak(e|ing) ?it ?cheaper$).*
    

    The .* (zero or more wild-cards) is the actual matching taking place. The negative look-ahead checks from the first character.

  2. If you want to exclude all strings containing one of those, you can make sure the look-ahead isn't matched before every character we match:

    ^((?!mak(e|ing) ?it ?cheaper).)*$
    

    An alternative is to add wild-cards to the beginning of your look-ahead (i.e. exclude all strings that, from the start of the string, contain anything, then your pattern), but I don't currently see any advantage to this (arbitrary length look-ahead is also less likely to be supported by any given tool):

    ^(?!.*mak(e|ing) ?it ?cheaper).*
    

Because of the ^ and $, either doing a find or a matches will work for either of the above (though, in the case of matches, the ^ is optional and, in the case of find, the .* outside the look-ahead is optional).


1: Although they may not be called that, many languages have functions equivalent to matches and find with regex.


The above is the strictly-regex answer to this question.

A better approach might be to stick to the original regex (mak(e|ing) ?it ?cheaper) and see if you can negate the matches directly with the tool or language you're using.

In Java, for example, this would involve doing if (!string.matches(originalRegex)) (note the !, which negates the returned boolean) instead of if (string.matches(negLookRegex)).

like image 122
Bernhard Barker Avatar answered Oct 18 '22 16:10

Bernhard Barker


The negative lookahead, I believe is what you're looking for. Maybe try:

(?!.*mak(e|ing) ?it ?cheaper)

And maybe a bit more flexible:

(?!.*mak(e|ing) *it *cheaper)

Just in case there are more than one space.

like image 23
Jerry Avatar answered Oct 18 '22 15:10

Jerry