Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to match arabic word with "tashkel"?

i'm using the following function to highlight certain word and it works fine in english

function highlight(str,toBeHighlightedWord)
     {

        toBeHighlightedWord="(\\b"+ toBeHighlightedWord.replace(/([{}()[\]\\.?*+^$|=!:~-])/g, "\\$1")+ "\\b)";
        var r = new RegExp(toBeHighlightedWord,"igm");
        str = str.replace(/(>[^<]+<)/igm,function(a){
            return a.replace(r,"<span color='red' class='hl'>$1</span>");
        });
        return str;
     }

but it dose not for Arabic text

so how to modify the regex to match Arabic words also Arabic words with tashkel, where tashkel is a characters added between the original characters example: "محمد" this without tashkel "مُحَمَّدُ" with tashkel the tashkel the decoration of the word and these little marks are characters

like image 793
Hager Aly Avatar asked Jun 14 '14 07:06

Hager Aly


1 Answers

In Javascript, you can use the word boundary \b only with these characters: [a-zA-Z0-9_]. A lookbehind assertion can not be useful too here since this feature is not supported by Javascript.

The way to solve the problem and "emulate" a kind of word boundary is to use a negated character class with the characters you want to highlight (since it is a negated character class, it will match characters that can't be part of the word.) in a capturing group for the left boundary. For the right a negative lookahead will be much simple.

toBeHighlightedWord="([^\\w\\u0600-\\u06FF\\uFB50-\\uFDFF\\uFE70-\\uFEFF]|^)("
              + toBeHighlightedWord.replace(/([{}()[\]\\.?*+^$|=!:~-])/g, "\\$1")
              + ")(?![\\w\\u0600-\\u06FF\\uFB50-\\uFDFF\\uFE70-\\uFEFF])";
var r = new RegExp(toBeHighlightedWord, "ig");
str = str.replace(/(>[^<]+<)/g, function(a){
    return a.replace(r, "$1<span color='red' class='hl'>$2</span>");
}

Character ranges that are used here come from three blocks of the unicode table:

  • 0600-06FF (Arabic)
  • FB50-FDFF (Arabic Presentation Forms-A)
  • FE70-FEFF (Arabic Presentation Forms-B)

Note that the use of a new capturing group changes the replacement pattern.

like image 180
Casimir et Hippolyte Avatar answered Nov 07 '22 07:11

Casimir et Hippolyte