Why does \w match only English words in javascript regex?

Tags:

I'm trying to find URLs in some text, using javascript code. The problem is, the regular expression I'm using uses \w to match letters and digits inside the URL, but it doesn't match non-english characters (in my case - Hebrew letters).

So what can I use instead of \w to match all letters in all languages?

533

asked Dec 29 '08 14:12

Doron Yaacoby

3 Answers

Because \w only matches ASCII characters 48-57 ('0'-'9'), 67-90 ('A'-'Z') and 97-122 ('a'-'z'). Hebrew characters and other special foreign language characters (for example, umlaut-o or tilde-n) are outside of that range.

Instead of matching foreign language characters (there are so many of them, in many different ASCII ranges), you might be better off looking for the characters that delineate your words - spaces, quotation marks, and other punctuation.

answered Sep 20 '22 19:09

David Koelle

The ECMA 262 v3 standard, which defines the programming language commonly known as JavaScript, stipulates that \w should be equivalent to [a-zA-Z0-9_] and that \d should be equivalent to [0-9]. \s on the other hand matches both ASCII and Unicode whitespace, according to the standard.

JavaScript does not support the \p syntax for matching Unicode things either, so there isn't a good way to do this. You could match all Hebrew characters with:

[\u0590-\u05FF]

This simply matches any code point in the Hebrew block.

You can match any ASCII word character or any Hebrew character with:

[\w\u0590-\u05FF]

answered Sep 19 '22 19:09

Jan Goyvaerts

I think you are looking for this regex:

^[אבגדהוזחטיכלמנסעפצקרשתץףןםa-zA-z0-9\s\.\-_\\\/]+$

answered Sep 19 '22 19:09

lani

Related questions
                            
                                Remove () and - and white spaces from phone number in Javascript [closed]
                            
                                Node.js Heroku Deployment - Fails To Exec Postinstall Script To Install Bower
                            
                                jquery .off doesn't seem to work
                            
                                Remove current event handler in JQuery?
                            
                                Select second to last element
                            
                                require jQuery to a safe variable in Tampermonkey script and console
                            
                                Duplicate empty header occur in datatable when enabling scrollX or scrollY when using Google Chrome
                            
                                javascript window.open without http://
                            
                                Webpack: Create a bundle with each file in directory
                            
                                Javascript: Loop through Array with Delay
                            
                                Mongoose error: Schema hasn't been registered for model when populate
                            
                                Encode String to HEX
                            
                                How to remove the line/rule of an axis in Chart.js?
                            
                                scroll event not working on mobile
                            
                                Knexjs returning mysql timestamp, datetime columns as Javascript Date object
                            
                                Vue.js can't toggle a font-awesome icon
                            
                                Return a Observable from a Subscription with RxJS
                            
                                Remove everything outside of the brackets Regex
                            
                                Framer motion animate when element is in-view (When you scroll to element)
                            
                                How to call a function every minute in a React component?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does \w match only English words in javascript regex?

Tags:

javascript

regex

hebrew