In JavaScript: <pre class="prettyprint"><code>"ab abc cab ab ab".replace(/\bab\b/g, "AB"); </code></pre> correctly gives me: <pre class="prettyprint"><code>"AB abc cab AB AB" </code></pre> When I use utf-8 characters though: <pre class="prettyprint"><code>"αβ αβγ γαβ αβ αβ".replace(/\bαβ\b/g, "AB"); </code></pre> the word boundary operator doesn't seem to work: <pre class="prettyprint"><code>"αβ αβγ γαβ αβ αβ" </code></pre> Is there a solution to this?

The word boundary assertion does only match if a word character is not preceded or followed by another word character (so <code>.\b.</code> is equal to <code>\W\w</code> and <code>\w\W</code>). And <code>\w</code> is defined as <code>[A-Za-z0-9_]</code>. So <code>\w</code> doesn’t match greek characters. And thus you cannot use <code>\b</code> for this case. What you could do instead is to use this: <pre class="prettyprint"><code>"αβ αβγ γαβ αβ αβ".replace(/(^|\s)αβ(?=\s|$)/g, "$1AB") </code></pre>

utf-8 word boundary regex in javascript

Tags:

In JavaScript:

"ab abc cab ab ab".replace(/\bab\b/g, "AB");

correctly gives me:

"AB abc cab AB AB"

When I use utf-8 characters though:

"αβ αβγ γαβ αβ αβ".replace(/\bαβ\b/g, "AB");

the word boundary operator doesn't seem to work:

"αβ αβγ γαβ αβ αβ"

Is there a solution to this?

733

asked May 21 '10 11:05

cherouvim

1 Answers

The word boundary assertion does only match if a word character is not preceded or followed by another word character (so .\b. is equal to \W\w and \w\W). And \w is defined as [A-Za-z0-9_]. So \w doesn’t match greek characters. And thus you cannot use \b for this case.

What you could do instead is to use this:

"αβ αβγ γαβ αβ αβ".replace(/(^|\s)αβ(?=\s|$)/g, "$1AB")

181

answered Sep 28 '22 09:09

Gumbo

Related questions
                            
                                Infinite loop in haskell? (newbie)
                            
                                How do I dynamically load raw assemblies that contains unmanaged code?(bypassing 'Unverifiable code failed policy check' exception)
                            
                                string.Format fails at runtime with array of integers
                            
                                C++: Platform dependent types - best pattern [closed]
                            
                                How to return spawned process exit code in Expect script?
                            
                                When should you use C# indexers?
                            
                                Autowiring a collection via the constructor with Spring
                            
                                Why does the Android emulator report "unknown virtual device", when the device is in my user directory?
                            
                                How would I use Maven to install the JCE Unlimited Strength Policy files?
                            
                                Has the use of C to implement other languages constrained their designs in any way?
                            
                                Running Java Program from Command Line Linux
                            
                                How do I ensure a sequence has a certain length?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With