Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.

$string = "http://example.com/foo/12/jacket Input/Output";
    match------------------------^--^

The length of the words between slashes should not matter.

Regex: (?<=.com\/\w)(\/) results:

$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";   
    matches--------------------^

Regex: (?<=\/\w)(\/) results:

$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
    matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
    matches--------------------^-^--------------^                    

Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.

Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?

NOTE: tagged with PHP because the regex should work in any of the preg_* functions.

like image 978
Jay Blanchard Avatar asked Feb 11 '16 17:02

Jay Blanchard


3 Answers

If you want to use preg_replace then this regex should work:

$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output

Thus replacing each / by a | after first / that appears after starting .com.

Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.

RegEx Demo

like image 54
anubhava Avatar answered Oct 29 '22 03:10

anubhava


Use \K here along with \G.grab the groups.

^.*?\.com\/\w+\K|\G(\/)\w+\K

See demo.

https://regex101.com/r/aT3kG2/6

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 

preg_match_all($re, $str, $matches);

Replace

$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m"; 
$str = "http://example.com/foo/12/jacket Input/Output"; 
$subst = "|"; 

$result = preg_replace($re, $subst, $str);
like image 35
vks Avatar answered Oct 29 '22 02:10

vks


Another \G and \K based idea.

$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
  • The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
  • \w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.

See demo at regex101

like image 31
bobble bubble Avatar answered Oct 29 '22 02:10

bobble bubble