preg_replace isn't working for some words/characters

Question

$str = 'کس نے موسیٰ کے بارے میں سنا ہے؟';
$str = preg_replace('/(?<=\b)موسیٰ(?=\b)/u', 'Musa', $str);
$str = preg_replace('/(?<=\b)سنا(?=\b)/u', 'suna', $str);
echo $str;

This fails to replace موسیٰ. It should give کس نے Musa کے بارے میں suna ہے؟ but instead gives کس نے موسیٰ کے بارے میں suna ہے؟.

This is happening for all words that end with a ٰ, like تعالیٰ . It works for words where ٰ is in the middle of the word (no words begin with a ٰ). Does this mean that \b just doesn't work with ٰ? Is it a bug?

Wiktor Stribiżew · Accepted Answer

The reason is that a word boundary matches in the following positions:

Before the first character in the string, if the first character is a word character.

After the last character in the string, if the last character is a word character.

Between two characters in the string, where one is a word character and the other is not a word character.

The "offending" symbol is U+0670 ARABIC LETTER SUPERSCRIPT ALEF belonging to \p{Mn} (nonspacing mark Unicode category), and is thus a non-word symbol. \b will match if it is preceded with a char belonging to \w (letter, digit, _).

Use unambiguous boundaries, only if the search phrase is not preceded/followed with word chars:

$str = 'کس نے موسیٰ کے بارے میں سنا ہے؟';
$str = preg_replace('/(?<!\w)موسیٰ(?!\w)/u', 'Musa', $str);
$str = preg_replace('/(?<!\w)سنا(?!\w)/u', 'suna', $str);
echo $str; // => کس نے Musa کے بارے میں suna ہے؟

See PHP demo.

The (?<!\w) is a negative lookbehind making sure there is no word char immediately before the subsequent consuming pattern, and (?!\w) is a negative lookahead that makes sure there is no word char immediately after the preceding consuming pattern.

preg_replace isn't working for some words/characters

Tags:

regex

php

encoding

twharmon

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

preg_replace isn't working for some words/characters

Tags:

regex

php

encoding

twharmon

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us