Count number of characters present in foreign language

Tags:

Is there any optimal way to implement character count for non English letters? For example, if we take the word "Mother" in English, it is a 6 letter word. But if you type the same word(மதர்) in Tamil, it is a three letter word(ம+த+ர்) but the last letter(ர்) will be considered as two characters(ர+ஂ=ர்) by the system. So is there any way to count the number of real characters?

One clue is that if we move the cursor in keyboard into the word (மதர்), it will pass through 3 letters only and not into 4 chars considering by the system, so is there any way to find the solution by using this? Any help on this would be greatly appreciated...

435

asked Dec 11 '12 07:12

Stranger

1 Answers

Update

Back from lunch =) I'm afraid that the previous won't work this well with any foreign language So i added another fiddle with a possible way

var UnicodeNsm = [Array 1280] //It holds all escaped Unicode Non Space Marks
function countNSMString(str) {
    var chars = str.split("");
    var count = 0;
    for (var i = 0,ilen = chars.length;i<ilen;i++) {
      if(UnicodeNsm.indexOf(escape(chars[i])) == -1) {
        count++;
       }
    }
    return count;
}

var English = "Mother";  
var Tamil = "மதர்";
var Vietnamese = "mẹ"
var Hindi = "मां"

function logL (str) {    
      console.log(str + " has " + countNSMString(str) + " visible Characters and " + str.length + " normal Characters" ); //"மதர் has 3 visible Characters"
}

logL(English) //"Mother has 6 visible Characters and 6 normal Characters"
logL(Tamil) //"மதர் has 3 visible Characters and 4 normal Characters"
logL(Vietnamese) //"mẹ has 2 visible Characters and 3 normal Characters"
logL(Hindi) //"मां has 1 visible Characters and 3 normal Characters"

So this just checks if theres any Character in the String which is a Unicode NSM character and ignores the count for this, this should work for the Most languages, not Tamil only, And an array with 1280 Elements shouldn't be that big of a performance issue

Here is a list with the Unicode NSM's http://www.fileformat.info/info/unicode/category/Mn/list.htm

Here is the according JSBin

After experimenting a bit with string operations, it turns out String.indexOf returns the same for

"ர்" and for "ர" meaning
"ர்ரர".indexOf("ர்") == "ர்ரர".indexOf("ர" + "்") //true but
"ர்ரர".indexOf("ர") == "ர்ரர".indexOf("ர" + "ர") //false

I took this opportunity and tried something like this

//ர்

var char = "ரர்ர்ரர்்";
var char2 = "ரரர்ர்ரர்்";    
var char3 = "ர்ரர்ர்ரர்்";

function countStr(str) {
         var  chars = str.split("");
         var count = 0;
          for(var i = 0, ilen = chars.length;i<ilen;i++) {
                 var chars2 = chars[i] + chars[i+1];   
                 if (str.indexOf(chars[i]) == str.indexOf(chars2))
                   i += 1;
               count++;
            }
         return count;
 }


console.log("--");

console.log(countStr(char)); //6

console.log(countStr(char2)); //7

console.log(countStr(char3)); //7

Which seems to work for the String above, it may take some adjustments, as i don't know a thing about Encoding and stuff, but maybe its a point you can begin with

Heres the JSBin

152

answered Oct 05 '22 23:10

Moritz Roessler

Related questions
                            
                                How to expand a DIV without affecting other elements
                            
                                Passing data to a jQuery event handler
                            
                                Is it possible to create line between elements in CSS3?
                            
                                Fibonacci sequence in Javascript
                            
                                How to initialize a Knockout viewmodel when initial viewmodel load is empty
                            
                                jqGrid - click, right click, onSelectRow
                            
                                How to implement *object* for improve my clock sample javascript program
                            
                                D3 drawing a hull around group of circles
                            
                                Check if jPlayer is playing
                            
                                Unit Test AngularJS controller with a resource that uses $routeParams
                            
                                Immutable Hash and Array implementation in JavaScript?
                            
                                How to listen for layout changes on a specific HTML element?
                            
                                Window.open only if the window is not open
                            
                                Is there a library to support autovivification on Javascript objects?
                            
                                Using JSON.stringify on custom class
                            
                                Can't display any graph with Sigma.js
                            
                                How to use mixins properly in Javascript
                            
                                Access denied error in IE when submitting form through javascript
                            
                                dynamically change script src client-side
                            
                                Moving an image randomly around a page

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count number of characters present in foreign language

Tags:

javascript

character-encoding

Stranger

People also ask

1 Answers

Update

Moritz Roessler

Recent Activity

Donate For Us