Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

utf-8 word boundary regex in javascript

Tags:

In JavaScript:

"ab abc cab ab ab".replace(/\bab\b/g, "AB"); 

correctly gives me:

"AB abc cab AB AB" 

When I use utf-8 characters though:

"αβ αβγ γαβ αβ αβ".replace(/\bαβ\b/g, "AB"); 

the word boundary operator doesn't seem to work:

"αβ αβγ γαβ αβ αβ" 

Is there a solution to this?

like image 733
cherouvim Avatar asked May 21 '10 11:05

cherouvim


People also ask

What is word boundary in Javascript?

A word boundary is a zero-width test between two characters. To pass the test, there must be a word character on one side, and a non-word character on the other side. It does not matter which side each character appears on, but there must be one of each.

What is word boundary in regex?

Introduction to the Python regex word boundaryBetween two characters in the string if the first character is a word character ( \w ) and the other is not ( \W – inverse character set of the word character \w ). After the last character in a string if the last character is the word character ( \w )

What does \b mean in regex?

A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.


1 Answers

The word boundary assertion does only match if a word character is not preceded or followed by another word character (so .\b. is equal to \W\w and \w\W). And \w is defined as [A-Za-z0-9_]. So \w doesn’t match greek characters. And thus you cannot use \b for this case.

What you could do instead is to use this:

"αβ αβγ γαβ αβ αβ".replace(/(^|\s)αβ(?=\s|$)/g, "$1AB") 
like image 181
Gumbo Avatar answered Sep 28 '22 09:09

Gumbo