i'm using the following function to highlight certain word and it works fine in english
function highlight(str,toBeHighlightedWord)
{
toBeHighlightedWord="(\\b"+ toBeHighlightedWord.replace(/([{}()[\]\\.?*+^$|=!:~-])/g, "\\$1")+ "\\b)";
var r = new RegExp(toBeHighlightedWord,"igm");
str = str.replace(/(>[^<]+<)/igm,function(a){
return a.replace(r,"<span color='red' class='hl'>$1</span>");
});
return str;
}
but it dose not for Arabic text
so how to modify the regex to match Arabic words also Arabic words with tashkel, where tashkel is a characters added between the original characters example: "محمد" this without tashkel "مُحَمَّدُ" with tashkel the tashkel the decoration of the word and these little marks are characters
In Javascript, you can use the word boundary \b
only with these characters: [a-zA-Z0-9_]
. A lookbehind assertion can not be useful too here since this feature is not supported by Javascript.
The way to solve the problem and "emulate" a kind of word boundary is to use a negated character class with the characters you want to highlight (since it is a negated character class, it will match characters that can't be part of the word.) in a capturing group for the left boundary. For the right a negative lookahead will be much simple.
toBeHighlightedWord="([^\\w\\u0600-\\u06FF\\uFB50-\\uFDFF\\uFE70-\\uFEFF]|^)("
+ toBeHighlightedWord.replace(/([{}()[\]\\.?*+^$|=!:~-])/g, "\\$1")
+ ")(?![\\w\\u0600-\\u06FF\\uFB50-\\uFDFF\\uFE70-\\uFEFF])";
var r = new RegExp(toBeHighlightedWord, "ig");
str = str.replace(/(>[^<]+<)/g, function(a){
return a.replace(r, "$1<span color='red' class='hl'>$2</span>");
}
Character ranges that are used here come from three blocks of the unicode table:
Note that the use of a new capturing group changes the replacement pattern.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With