Regex: Matching by exclusion, without look-ahead - is it possible?

Q: Does regex match anything?

Matching a Single Character Using Regex ' dot character in a regular expression matches a single character without regard to what character it is. The matched character can be an alphabet, a number or, any special character.

Q: Can I use negative Lookbehind?

The positive lookbehind ( (? <= ) ) and negative lookbehind ( (? <! ) ) zero-width assertions in JavaScript regular expressions can be used to ensure a pattern is preceded by another pattern.

Q: What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

Tags:

regex

regex-negation

In some regex flavors, [negative] zero-width assertions (look-ahead/look-behind) are not supported.

This makes it extremely difficult (impossible?) to state an exclusion. For example "every line that does not have "foo" on it", like this:

^((?!foo).)*$

Can the same thing be achieved without using look-around at all (complexity and performance concerns set aside for the moment)?

969

asked Jan 21 '09 16:01

Tomalak

2 Answers

UPDATE: It fails "with two ff before oo" as @Ciantic pointed out in the comments.

^(f(o[^o]|[^o])|[^f])*$

NOTE: It is much much easier just to negate a match on the client side instead of using the above regex.

The regex assumes that each line ends with a newline char if it is not then see C++'s and grep's regexs.

Sample programs in Perl, Python, C++, and grep all give the same output.

perl

#!/usr/bin/perl -wn
print if /^(f(o[^o]|[^o])|[^f])*$/;

python

#!/usr/bin/env python
import fileinput, re, sys
from itertools import ifilter

re_not_foo = re.compile(r"^(f(o[^o]|[^o])|[^f])*$")
for line in ifilter(re_not_foo.match, fileinput.input()):
    sys.stdout.write(line)

c++

#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main()
{
  boost::regex re("^(f(o([^o]|$)|([^o]|$))|[^f])*$");
  //NOTE: "|$"s are there due to `getline()` strips newline char

  std::string line;
  while (std::getline(std::cin, line)) 
    if (boost::regex_match(line, re))
      std::cout << line << std::endl;
}

grep

$ grep "^\(f\(o\([^o]\|$\)\|\([^o]\|$\)\)\|[^f]\)*$" in.txt

Sample file:

foo
'foo'
abdfoode
abdfode
abdfde
abcde
f

fo
foo
fooo
ofooa
ofo
ofoo

Output:

abdfode
abdfde
abcde
f

fo
ofo

169

answered Oct 25 '22 20:10

jfs

Came across this Question and took the fact that there wasn't a fully-working regex as a personal challenge. I believe I've managed to create a regex that does work for all inputs - provided you can use atomic grouping/possessive quantifiers.

Of course, I'm not sure if there are any flavours that allow atomic grouping but not lookaround, but the Question asked if it's possible in regex to state an exclusion without lookaround, and it is technically possible:

\A(?:$|[^f]++|f++(?:[^o]|$)|(?:f++o)*+(?:[^o]|$))*\Z

Explanation:

\A                         #Start of string
(?:                        #Non-capturing group
    $                      #Consume end-of-line. We're not in foo-mode.
    |[^f]++                #Consume every non-'f'. We're not in foo-mode.
    |f++(?:[^o]|$)          #Enter foo-mode with an 'f'. Consume all 'f's, but only exit foo-mode if 'o' is not the next character. Thus, 'f' is valid but 'fo' is invalid.
    |(?:f++o)*+(?:[^o]|$)  #Enter foo-mode with an 'f'. Consume all 'f's, followed by a single 'o'. Repeat, since '(f+o)*' by itself cannot contain 'foo'. Only exit foo-mode if 'o' is not the next character following (f+o). Thus, 'fo' is valid but 'foo' is invalid.
)*                         #Repeat the non-capturing group
\Z                         #End of string. Note that this regex only works in flavours that can match $\Z

If, for whatever reason, you can use atomic grouping but not possessive quantifiers nor lookaround, you can use:

\A(?:$|(?>[^f]+)|(?>f+)(?:[^o]|$)|(?>(?:(?>f+)o)*)(?:[^o]|$))*\Z

As others point out, though, it's probably more practical to just negate a match through other means.

answered Oct 25 '22 18:10

Sarov

Related questions
                            
                                Is it possible to create a dynamic regular expression in Ruby
                            
                                JavaScript regex for alphanumeric string with length of 3-5 chars
                            
                                Regex for all strings not containing a string? [duplicate]
                            
                                How to remove a defined part of a string?
                            
                                Python Regular Expression Match All 5 Digit Numbers but None Larger
                            
                                How to detect if string contains 1 uppercase letter in PHP
                            
                                Regular Expression for password validation
                            
                                Get content between two strings PHP
                            
                                What is the proper way of inserting a pipe into a Java Pattern expression?
                            
                                Regex to check for at least 3 characters?
                            
                                Return a regex match in a Bash script, instead of replacing it
                            
                                How to add features missing from the Java regex implementation?
                            
                                haskell regex substitution
                            
                                How to auto generate regex from given list of strings? [closed]
                            
                                Redirect to other domain but keep typed domain
                            
                                Regex and unicode
                            
                                how to test for a regex match
                            
                                Regular Expressions in Python unexpectedly slow
                            
                                Range of UTF-8 Characters in C++11 Regex
                            
                                Github "Branch name pattern" negation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With