using javascript, how can I count a mix of asian characters and english words

Question

I need to take a string of mixed Asian characters (for now, assume only Chinese kanji or Japanese kanji/hiragana/katakana) and "Alphanumeric" (i.e., Enlgish, French), and count it in the following way:

1) count each Asian CHARACTER as 1; 2) count each Alphanumeric WORD as 1;

a few examples:

株式会社myCompany = 4 chars + 1 word = 5 total 株式会社マイコ = 7 chars

my only idea so far is to use:

var wordArray=val.split(/\w+/);

and then check each element to see if its contents are alphanumeric (so count as 1) or not (so take the array length). But I don't feel that's really very clever at all and the text being counted might be up to 10,000words, so not very quick.

Ideas?

bobince · Accepted Answer

Unfortunately JavaScript's RegExp has no support for Unicode character classes; \w only applies to ASCII characters (modulo some browser bugs).

You can use Unicode characters in groups, though, so you can do it if you can isolate each set of characters you are interested in as a range. eg.:

var r= new RegExp(
    '[A-Za-z0-9_\]+|'+                             // ASCII letters (no accents)
    '[\u3040-\u309F]+|'+                           // Hiragana
    '[\u30A0-\u30FF]+|'+                           // Katakana
    '[\u4E00-\u9FFF\uF900-\uFAFF\u3400-\u4DBF]',   // Single CJK ideographs
'g');

var nwords= str.match(r).length;

(This attempts to give a more realistic count of ‘words’ for Japanese, counting each run of one type of kana as a word. That's still not right, of course, but it's probably closer than treating each syllable as one word.)

Obviously there are many more characters that would have to be accounted for if you wanted to ‘do it properly’. Let's hope you don't have characters outside the basic multilingual plane, for one!

using javascript, how can I count a mix of asian characters and english words

Tags:

javascript

text

character

counting

user224513

1 Answers

bobince

Recent Activity

Donate For Us

using javascript, how can I count a mix of asian characters and english words

Tags:

javascript

text

character

counting

user224513

1 Answers

bobince

Related questions

Recent Activity

Donate For Us