In JavaScript:
"ab abc cab ab ab".replace(/\bab\b/g, "AB");
correctly gives me:
"AB abc cab AB AB"
When I use utf-8 characters though:
"αβ αβγ γαβ αβ αβ".replace(/\bαβ\b/g, "AB");
the word boundary operator doesn't seem to work:
"αβ αβγ γαβ αβ αβ"
Is there a solution to this?
A word boundary is a zero-width test between two characters. To pass the test, there must be a word character on one side, and a non-word character on the other side. It does not matter which side each character appears on, but there must be one of each.
Introduction to the Python regex word boundaryBetween two characters in the string if the first character is a word character ( \w ) and the other is not ( \W – inverse character set of the word character \w ). After the last character in a string if the last character is the word character ( \w )
A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.
The word boundary assertion does only match if a word character is not preceded or followed by another word character (so .\b.
is equal to \W\w
and \w\W
). And \w
is defined as [A-Za-z0-9_]
. So \w
doesn’t match greek characters. And thus you cannot use \b
for this case.
What you could do instead is to use this:
"αβ αβγ γαβ αβ αβ".replace(/(^|\s)αβ(?=\s|$)/g, "$1AB")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With