I have a string that contains a few words. I want to find out all the words that contain only characters of Tamil Unicode. I am new to javascript.
Using Go, I do the same like:
tokens := strings.Fields(stringContent, delim) // split based on delim, say space
for _, token := range tokens { //like foreach
r, l := utf8.DecodeRuneInString(token)
if l != 1 {
if unicode.Is(unicode.Tamil, r) {
// Tamil word
}
}
}
I found that string.split() will give me the individual words based on the delimiter, in javascript. But I am not able to find out how to get if the word is a UTF-8 TAMIL word. Can someone help me achieve this in javascript ?
Easy way is to do a regular expression match for words having characters in a unicode range
Hope this helps : http://kourge.net/projects/regexp-unicode-block
A sample with which you can start
"இந்தியா ASASAS எறத்தாழ ASSASAS குடியரசு ASWED SAASAS".match(/[\u0B80-\u0BFF]+/g);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With