Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex - negative lookahead to exclude strings

I am trying to find (and replace with something else) in a text all parts which

  1. start with '/'
  2. ends with '/'
  3. between the two /'s there can be anything, except the strings '.' and '..'.

(For your info, I am searching for and replacing directory and file names, hence the '.' and '..' should be excluded.)

This is the regular expression I came up with:

/(?!\.|\.\.)([^/]+)/

The second part

([^/]+)

matches every sequence of characters, '/' excluded. There are no character restrictions required, I am simply interpreting the input.

The first part

(?!\.|\.\.)

uses the negative lookahead assertion to exclude the strings '.' and '..'.

However, this doesn't seem to work in PHP with mb_ereg_replace().

Can somebody help me out? I fail to see what's wrong with my regex.

Thank you.

like image 373
Sevy D. Avatar asked Jun 14 '11 22:06

Sevy D.


2 Answers

POSIX regex probably don't have support for negative lookaheads. (I may be wrong though)

Anyway since PCRE regex are usually faster than POSIX I think you can use PCRE version of the same function since PCRE supports utf8 as well using u flag.

Consider this code as a substitute:

preg_replace('~/(?!\.|\.\.)([^/]+)/~u', "", $str);

EDIT: Even better is to use:

preg_replace('~/(?!\.)([^/]+)/~u', "", $str);
like image 147
anubhava Avatar answered Sep 23 '22 14:09

anubhava


This is a little verbose, but it definitely does work:

#/((\.[^./][^/]*)|(\.\.[^/]+)|([^.][^/]*))/#
^  |------------| |---------| |---------|
|        |             |               |
|        |        text starting with   |
|        |        two dots, that isn't |
|        |             "." or ".."     |
|  text starting with                  |
|  a dot, that isn't                text not starting
|  "." or ".."                         with a dot
|
delimiter

Does not match:

  • hi
  • //
  • /./
  • /../

Does match:

  • /hi/
  • /.hi/
  • /..hi/
  • /.../

Have a play around with it on http://regexpal.com/.

I wasn't sure whether or not you wanted to allow //. If you do, stick * before the last /.

like image 26
Lightness Races in Orbit Avatar answered Sep 21 '22 14:09

Lightness Races in Orbit