Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex match non-greedy on one optional string and greedy on another

Tags:

regex

pcre

I've researched around for a while and haven't found a clue for matching the following pattern (I am also very new to regex, though), it looks either like

/abc/foo/bar(/*) 

or

/abc/foo/bar/stop

So I want to match and capture the above string as /abc/foo/bar. Now "/stop" is an optional string that could be appended at the end of the pattern. The goal is to get the desired capture while ignoring "stop" if they present (and if "stop" exists multiple times stop at the first "stop"), while allow as many slashes in the middle as possible except the slash at the end of line.

If I simply do:

^(/.*[^/])/*$

Which is greedy in including all slashes until I strip off the possible last occurrence; but in order to accept the second case where I have an optional "/stop", I need to match in a non-greedy way until I find the first possible "/stop" and stop there.

How can I craft a single regex that matches both cases?

EDIT: Not sure if my previous example wasn't clear enough. Try to give more, say I want to match and capture "/abc/foo/bar" in all of the following strings:

/abc/foo/bar
/abc/foo/bar/
/abc/foo/bar///
/abc/foo/bar/stop
/abc/foo/bar/stop/foo/bar/stop/stop
/abc/foo/bar//stop

While it won't match any of the followings:

/abc/foo/bar/sto (will match the whole "/abc/foo/bar/sto" instead)
/abc/foo/bar/abc/foo/bar (it will catch "/abc/foo/bar/abc/foo/bar" instead)

Let me know if this is clear enough. Thanks!

like image 485
Superziyi Avatar asked Jul 11 '14 17:07

Superziyi


2 Answers

Try this:

/^(?:\/+(?!$|(?:stop\/?))[^\/]+)*/

Regex101 Demo

Explanation:

This matches the start of the string (^), followed by zero or more instances of the following pattern:

  • one or more slashes (\/+) that are not followed by the end of the string ($) or by stop, followed by
  • one or more non-slash characters ([^\/]+)

Regular expression visualization

Here's a Debuggex Demo with working unit tests.

EDIT: Here is an alternative, arguably simpler, regex:

/^.+?(?=\/*$|\/+stop\b)/

This matches one or more characters in a non-greedy manner, then stops when whatever is after the match is one of the following:

  1. the end of the string ($), possibly preceded by one or more slashes (\/*)
  2. one or more slashes, the word stop, and a word break.

Here's a Regex101 demo of this option.

EDIT 2: If you'd like to test this, here's a simple JavaScript test that tests the second regex above against various test strings and logs the results to the console:

var re = /^.+?(?=\/*$|\/+stop\b)/,
    test_strings = ["/abc/foo/bar",
                    "/abc/foo/bar/",
                    "/abc/foo/bar///",
                    "/abc/foo/bar/stop",
                    "/abc/foo/bar/stop/foo/bar/stop/stop",
                    "/abc/foo/bar//stop",
                    "/abc/foo/bar/sto",
                    "/abc/foo/bar/abc/foo/bar"];
for(var s = 0; s < test_strings.length; s++) {
    console.log(test_strings[s].match(re)[0]);
}

/*
Results:

/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar
/abc/foo/bar/sto
/abc/foo/bar/abc/foo/bar 

*/
like image 123
elixenide Avatar answered Sep 28 '22 21:09

elixenide


You can try something like this:

^((?:/[^/]+)+?)(?:/+|/+stop(?:/.*)?)$

demo

and if atomic groups are available, you better write:

^((?:/[^/]+)+?)(?>/+$|/+stop(?:/.*)?)

demo

If lookaheads are available:

^/(?>[^/]+|/(?!/*(?:$|stop(?:/|$))))+

demo

ps: don't forget to escape slashes if your delimiters are slashes.

As Ed Cottrell notices it, features like atomic grouping are not available in language like Javascript or in the re module of Python. However, this feature can be efficiently emulated using the fact that a lookahead is naturaly atomic: (?>a+) <=> (?=(a+))\1

like image 28
Casimir et Hippolyte Avatar answered Sep 28 '22 20:09

Casimir et Hippolyte