this is a follow up after reading How to specify "Space or end of string" and "space or start of string"?
From there, it states means to match a word in a phrase. I can even add a few other solutions. But as soon as a =
or "
is added, it quit working. Why?
i am going to search for stackoverflow
and replace it with OK
using preg_replace()
preg_replace('/\bstackoverflow\b/', 'OK', $input_line)
input:
1: stackoverflow xxx
2: xxx stackoverflow xxx
3: xxx stackoverflow
result:
1: OK xxx
2: xxx OK xxx
3: xxx OK
now, if i change it to match stackoverflow=""
, it stops working.
preg_replace('/\bstackoverflow=""\b/', 'OK', $input_line)
input:
1: stackoverflow="" xxx
2: xxx stackoverflow="" xxx
3: xxx stackoverflow=""
result:
1: stackoverflow="" xxx
2: xxx stackoverflow="" xxx
3: xxx stackoverflow=""
the same will happen if i use on my regex: /\bstackoverflow=\b/
or /\bstackoverflow"\b/
. I already checked the manual if =
or "
are special chars, they are not. but i even tried /\bstackoverflow\=\"\"\b/
Why is that?
in that example removing \b
will also solve it, but it will also match nostackoverflow=""not
which i do not want.
i also tried alternatives to \b
such as [ ^]
and ( |^)
. Interestingly [ ^]
(space or beginning of line) will not work for beginning of line, only space. But ( |^)
will work fine for both.
The \b metacharacter matches at the beginning or end of a word.
Word Boundary: \b The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”.
The problem is your use of \b
which is a "word boundary." It's a placeholder for (^\w|\w$|\W\w|\w\W)
, where \w
is a "word" character [A-Za-z0-9_]
and \W
is the opposite. The problem is that a "
doesn't match the "word" characters, so the boundary condition is not met.
Try using a \s
instead, which will match any whitespace character.
(?:^|\s)stackoverflow=""(?:\s|$)
Characters inside a class are not interpreted, except for ^
used as a negation operator at the beginning of a class, and -
as a range operator. This is why [ ^]
wouldn't work for you. It was searching for a literal ^
.
$ php -a
Interactive shell
php > $input_line='
php ' stackoverflow="" xxx
php ' xxx stackoverflow="" xxx
php ' xxx stackoverflow=""
php ' ';
php > echo preg_replace('/(?:^|\s)stackoverflow=""(?:\s|$)/', 'OK', $input_line);
OKxxx
xxxOKxxx
xxxOK
https://regex101.com/r/nP2aB8/1
From the regular-expressions.info Word boundaries page:
The metacharacter
\b
is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
A very good explanation from nhahtdh post:
A word boundary
\b
is equivalent to:(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))
Which means:
Right ahead, there is (at least) a character that is a word character, and right behind, we cannot find a word character (either the character is not a word character, or it is the start of the string).
OR
Right behind, there is (at least) a character that is a word character, and right ahead, we cannot find a word character (either the character is not a word character, or it is the end of the string).
The reason why \b
is not suitable is because it requires a word/non-word character to appear after/before it which depends on the immediate context on both sides of \b
. When you build a regex dynamically, you do not know which one to use, \B
or \b
. For your case, you could use '/\bstackoverflow=""\B/'
, but it would require a smart word/non-word boundary appending. However, there is an easier way: use negative lookarounds.
(?<!\w)stackoverflow=""(?!\w)
See regex demo
The regex contains negative lookarounds instead of word boundaries. The (?<!\w)
lookbehind fails the match if there is a word character before stackoverflow=""
, and (?!\w)
lookahead fails the match if stackoverflow=""
is followed by a word character.
What a word shorthand character class \w
matches depends if you enable the Unicode modifier /u
. Without it, a \w
matches just [a-zA-Z0-9_]
. You can lay further restrictions using the lookarounds.
PHP demo:
$re = '/(?<!\w)stackoverflow=""(?!\w)/';
$str = ",stackoverflow=\"\" xxx\nxxx stackoverflow=\"\" xxx\nxxx stackoverflow=\"\"\nstackoverflow=\"\" xxx";
echo preg_replace($re, "NEW=\"\"", $str);
NOTE: If you pass your string as a variable, remember to escape all special characters in it with preg_quote
:
$re = '/(?<!\w)' . preg_quote($keyword, '/') . '(?!\w)/';
Here, notice the second argument to preg_quote
, which is /
, the regex delimiter char.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With