Why can't I use accented characters next to a word boundary?

Tags:

I'm trying to make a dynamic regex that matches a person's name. It works without problems on most names, until I ran into accented characters at the end of the name.

Example: Some Fancy Namé

The regex I've used so far is:

/\b(Fancy Namé|Namé)\b/i

Used like this:

"Goal: Some Fancy Namé. Awesome.".replace(/\b(Fancy Namé|Namé)\b/i, '<a href="#">$1</a>');

This simply won't match. If I replace the é with a e, it matches just fine. If I try to match a name such as "Some Fancy Naméa", it works just fine. If I remove the word last word boundary anchor, it works just fine.

Why doesn't the word boundary flag work here? Any suggestions on how I would get around this problem?

I have considered using something like this, but I'm not sure what the performance penalties would be like:

"Some fancy namé. Allow me to ellaborate.".replace(/([\s.,!?])(fancy namé|namé)([\s.,!?]|$)/g, '$1<a href="#">$2</a>$3')

Suggestions? Ideas?

925

asked Mar 15 '10 19:03

Rexxars

2 Answers

JavaScript's regex implementation is not Unicode-aware. It only knows the ‘word characters’ in standard low-byte ASCII, which does not include é or any other accented or non-English letters.

Because é is not a word character to JS, é followed by a space can never be considered a word boundary. (It would match \b if used in the middle of a word, like Namés.)

/([\s.,!?])(fancy namé|namé)([\s.,!?]|$)/

Yeah, that would be the usual workaround for JS (though probably with more punctuation characters). For other languages you'd generally use lookahead/lookbehind to avoid matching the pre and post boundary characters, but these are poorly supported/buggy in JS so best avoided.

184

answered Oct 19 '22 23:10

bobince

Rob is correct. Quoted from the ECMAScript 3rd edition:

15.10.2.6 Assertion:

The production Assertion \b evaluates by ...

2. Call IsWordChar(e−1) and let a be the boolean result
3. Call IsWordChar(e) and let b be the boolean result

and

The internal helper function IsWordChar ... performs the following:

3. If c is one of the sixty-three characters in the table below, return true.
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 _

Since é is not one of these 63 characters, the location between é and a will be considered a word boundary.

If you know the class of characters, you may use a negative look-forward assertion, e.g.

/(^|[^\wÀ-ÖØ-öø-ſ])(Fancy Namé|Namé)(?![\wÀ-ÖØ-öø-ſ])/

answered Oct 20 '22 00:10

kennytm

Related questions
                            
                                how to redirect using ng-click
                            
                                Slick carousel right to left
                            
                                Handle WebPack CSS imports when testing with Mocha
                            
                                Difference between `$(document).on("click", "a"` and `$("a").click(` [duplicate]
                            
                                Three.js - Object follows mouse position
                            
                                WebStorm runs very very very slow and continually gives out of memory error
                            
                                Inside Schema method scopes "this" is empty {} in Mongoose 4.4.12
                            
                                Javascript Second Counter
                            
                                Mapbox. Get list of points by click on cluster
                            
                                Show every other tick label on d3 time axis?
                            
                                Slick Carousel - Set first slide
                            
                                console.log not working on any karma project
                            
                                VueJS Use prop as data-attribute value
                            
                                Download csv file as response on AJAX request
                            
                                Conditional where clause in firestore queries
                            
                                React Routing Redirect onClick
                            
                                How to 'repeat' an array n times [duplicate]
                            
                                Include external javascript file in a nuxt.js page
                            
                                How to set the last-clicked anchor to be a different color from all other links?
                            
                                Pass 2 values to a javascript function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can't I use accented characters next to a word boundary?

Tags:

javascript

regex

replace

unicode

diacritics

Rexxars

People also ask

2 Answers

bobince

kennytm

Recent Activity

Donate For Us