Detect unicode language of a string in javascript

Question

I have a string that contains a few words. I want to find out all the words that contain only characters of Tamil Unicode. I am new to javascript.

Using Go, I do the same like:

            tokens := strings.Fields(stringContent, delim) // split based on delim, say space

            for _, token := range tokens { //like foreach
                r, l := utf8.DecodeRuneInString(token)
                if l != 1 {
                    if unicode.Is(unicode.Tamil, r) {
                        // Tamil word
                    }
                }
            }

I found that string.split() will give me the individual words based on the delimiter, in javascript. But I am not able to find out how to get if the word is a UTF-8 TAMIL word. Can someone help me achieve this in javascript ?

Diode · Accepted Answer

Easy way is to do a regular expression match for words having characters in a unicode range

Hope this helps : http://kourge.net/projects/regexp-unicode-block

A sample with which you can start

"இந்தியா ASASAS எறத்தாழ ASSASAS குடியரசு ASWED SAASAS".match(/[\u0B80-\u0BFF]+/g);

Detect unicode language of a string in javascript

Tags:

javascript

html

string

Sankar

1 Answers

Diode

Recent Activity

Donate For Us

Detect unicode language of a string in javascript

Tags:

javascript

html

string

Sankar

1 Answers

Diode

Related questions

Recent Activity

Donate For Us