Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex: Negative lookahead

I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData and /foo/someVariableData/bar/someOtherVariableData

I need two regexes. Each needs to match one but not the other.

The regexes I originally came up with are: /foo/.+ and /foo/.+/bar/.+ respectively.

I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar) and set up the following code to test it

public static void main(String[] args) {     String shouldWork = "/foo/abc123doremi";     String shouldntWork = "/foo/abc123doremi/bar/def456fasola";     String regex = "/foo/.+(?!bar)";     System.out.println("ShouldWork: " + shouldWork.matches(regex));     System.out.println("ShouldntWork: " + shouldntWork.matches(regex)); } 

And, of course, both of them resolve to true.

Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.

Thanks,

like image 737
Cody S Avatar asked Jun 20 '12 18:06

Cody S


People also ask

What is negative lookahead in regex?

Because the lookahead is negative, this means that the lookahead has successfully matched at the current position. At this point, the entire regex has matched, and q is returned as the match.

What is ?: In regex?

It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.

What is a negative Lookbehind?

A negative lookbehind assertion asserts true if the pattern inside the lookbehind is not matched.

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.


1 Answers

Try

String regex = "/foo/(?!.*bar).+"; 

or possibly

String regex = "/foo/(?!.*\\bbar\\b).+"; 

to avoid failures on paths like /foo/baz/crowbars which I assume you do want that regex to match.

Explanation: (without the double backslashes required by Java strings)

/foo/ # Match "/foo/" (?!   # Assert that it's impossible to match the following regex here:  .*   #   any number of characters  \b   #   followed by a word boundary  bar  #   followed by "bar"  \b   #   followed by a word boundary. )     # End of lookahead assertion .+    # Match one or more characters 

\b, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b or after the r in "bar", but it fails to match between w and b in "crowbar".

Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.

like image 58
Tim Pietzcker Avatar answered Sep 19 '22 18:09

Tim Pietzcker