I'm trying to craft two regular expressions that will match URIs. These URIs are of the format: /foo/someVariableData
and /foo/someVariableData/bar/someOtherVariableData
I need two regexes. Each needs to match one but not the other.
The regexes I originally came up with are: /foo/.+
and /foo/.+/bar/.+
respectively.
I think the second regex is fine. It will only match the second string. The first regex, however, matches both. So, I started playing around (for the first time) with negative lookahead. I designed the regex /foo/.+(?!bar)
and set up the following code to test it
public static void main(String[] args) { String shouldWork = "/foo/abc123doremi"; String shouldntWork = "/foo/abc123doremi/bar/def456fasola"; String regex = "/foo/.+(?!bar)"; System.out.println("ShouldWork: " + shouldWork.matches(regex)); System.out.println("ShouldntWork: " + shouldntWork.matches(regex)); }
And, of course, both of them resolve to true
.
Anybody know what I'm doing wrong? I don't need to use Negative lookahead necessarily, I just need to solve the problem, and I think that negative lookahead might be one way to do it.
Thanks,
Because the lookahead is negative, this means that the lookahead has successfully matched at the current position. At this point, the entire regex has matched, and q is returned as the match.
It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.
A negative lookbehind assertion asserts true if the pattern inside the lookbehind is not matched.
The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.
Try
String regex = "/foo/(?!.*bar).+";
or possibly
String regex = "/foo/(?!.*\\bbar\\b).+";
to avoid failures on paths like /foo/baz/crowbars
which I assume you do want that regex to match.
Explanation: (without the double backslashes required by Java strings)
/foo/ # Match "/foo/" (?! # Assert that it's impossible to match the following regex here: .* # any number of characters \b # followed by a word boundary bar # followed by "bar" \b # followed by a word boundary. ) # End of lookahead assertion .+ # Match one or more characters
\b
, the "word boundary anchor", matches the empty space between an alphanumeric character and a non-alphanumeric character (or between the start/end of the string and an alnum character). Therefore, it matches before the b
or after the r
in "bar"
, but it fails to match between w
and b
in "crowbar"
.
Protip: Take a look at http://www.regular-expressions.info - a great regex tutorial.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With