<p>I would like to use this regular expression <strong>new RegExp("\b"+pat+"\b")</strong> in greek text but the "\b" metacharacter supports only ASCII characters.</p> <p>I tried XregExp library but i didnt manage to solve the issue.</p> <p>Any suggestions would be greatly appreciated.</p>

<p>I think this was helpful to your answer.,</p> <pre class="prettyprint"><code><script src="xregexp.js"></script> <script src="xregexp-unicode-base.js"></script> <script> var unicodeWord = XRegExp("^\\p{L}+$"); unicodeWord.test("Русский"); // true unicodeWord.test("日本語"); // true unicodeWord.test("العربية"); // true </script>  <script src="xregexp-unicode-scripts.js"></script> <script> XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true </script> </code></pre> <p><strong>Please refer the following location :</strong> http://xregexp.com/plugins/</p>

<p>So the answer is just, that you can not use the JavaScript native mechanisms or any library which uses those mechanisms to match words the way you want to. As you already stated, \b matches words. Words must consists of word characters. And in JavaScript (and actually other regex implementations word characters are <strong>a-z, A-Z, 0-9 and _</strong>. But many other Languages just implement the \b metacharacter in a different way JavaScript does.</p> <p>The answer "JavaScript does not support Unicode" is a bit to easy and in fact completely wrong. JavaScript just doesn't use unicode for the character classes. If JavaScript wouldn't support unicode you couldn't even use unicode Characters in String literals and of course this is possible in JavaScript.</p> <p>According to the ECMA 262 Standard (ECMAScript) (Section 15.10.2.6):</p> <p>[...] The production Assertion :: \ b evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following: </p> <ol> <li>Let e be x's endIndex. </li> <li>Call IsWordChar(e–1) and let a be the Boolean result. </li> <li>Call IsWordChar(e) and let b be the Boolean result. </li> <li>If a is true and b is false, return true. </li> <li>If a is false and b is true, return true. </li> <li>Return false. [..] </li> </ol> <p>The abstract operation IsWordChar takes an integer parameter e and performs the following: </p> <ol> <li>If e == –1 or e == InputLength, return false. </li> <li>Let c be the character Input[e]. </li> <li>If c is one of the sixty-three characters below, return true. a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _</li> <li>Return false</li> </ol> <p>This just shows, that the \b uses the Algorithm of "isWordChar" to check if what you try to match is actually a word. Int he definition of "isWordChar" you can see the exact definition of which characters will return true for "isWordChar".</p> <p>In my Opinion this has absolutely nothing to do with the character set being used. It's neither ASCII nor UNICODE compilant here. It's just these 63 characters.</p>

Javascript unicode (greek) regular expressions

2 Answers

I think this was helpful to your answer.,

<script src="xregexp.js"></script>
<script src="xregexp-unicode-base.js"></script>
<script>
    var unicodeWord = XRegExp("^\\p{L}+$");

    unicodeWord.test("Русский"); // true
    unicodeWord.test("日本語"); // true
    unicodeWord.test("العربية"); // true
</script>

<!-- \p{L} is included in the base script, but other categories, scripts,
and blocks require token packages -->
<script src="xregexp-unicode-scripts.js"></script>
<script>
    XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
</script>

Please refer the following location : http://xregexp.com/plugins/

114

answered Oct 12 '22 23:10

John Peter

So the answer is just, that you can not use the JavaScript native mechanisms or any library which uses those mechanisms to match words the way you want to. As you already stated, \b matches words. Words must consists of word characters. And in JavaScript (and actually other regex implementations word characters are a-z, A-Z, 0-9 and _. But many other Languages just implement the \b metacharacter in a different way JavaScript does.

The answer "JavaScript does not support Unicode" is a bit to easy and in fact completely wrong. JavaScript just doesn't use unicode for the character classes. If JavaScript wouldn't support unicode you couldn't even use unicode Characters in String literals and of course this is possible in JavaScript.

According to the ECMA 262 Standard (ECMAScript) (Section 15.10.2.6):

[...] The production Assertion :: \ b evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following:

Let e be x's endIndex.
Call IsWordChar(e–1) and let a be the Boolean result.
Call IsWordChar(e) and let b be the Boolean result.
If a is true and b is false, return true.
If a is false and b is true, return true.
Return false. [..]

The abstract operation IsWordChar takes an integer parameter e and performs the following:

If e == –1 or e == InputLength, return false.
Let c be the character Input[e].
If c is one of the sixty-three characters below, return true. a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _
Return false

This just shows, that the \b uses the Algorithm of "isWordChar" to check if what you try to match is actually a word. Int he definition of "isWordChar" you can see the exact definition of which characters will return true for "isWordChar".

In my Opinion this has absolutely nothing to do with the character set being used. It's neither ASCII nor UNICODE compilant here. It's just these 63 characters.

answered Oct 13 '22 00:10

Chris

Related questions
                            
                                Parallax content in a horizontal ScrollMagic container per slide is stuttered
                            
                                Test callback function with jest
                            
                                Understanding how strapi relations works
                            
                                Multiline text to fit parent container in React JS
                            
                                How to implement scroll restoration for React Router SPA
                            
                                Checking to see if a DOM element has focus
                            
                                What is the best online resource for 3D rendering in JavaScript? [closed]
                            
                                Aborting a jQuery getJSON XMLHttpRequest
                            
                                How do I dynamically create a document for download in Javascript?
                            
                                Is it possible to run JavaScript before any image loads?
                            
                                JavaScript distributed computing project [closed]
                            
                                how to find memory leaks in javascript
                            
                                Javascript webkit-fake-url
                            
                                Lossless compression method to shorten string before base64 encoding to make it shorter?
                            
                                IEEE-754 double (64-bit floating point) vs. long (64-bit integer) revisited
                            
                                Node.js and client sharing the same scripts
                            
                                Raphael JS Implementing a "Pencil" tool efficiently
                            
                                HTML5 frameworks for an enamoured Flex Developer [closed]
                            
                                Javascript Intellisense in Razor View Engine child pages
                            
                                jquery ajax synchronous call beforeSend

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Javascript unicode (greek) regular expressions

Tags:

javascript

regex

unicode

character-properties

xregexp

kylito

People also ask

2 Answers

John Peter

Chris

Recent Activity

Donate For Us