Split Kannada word into syllabic clusters

Question

We are wondering if there is any method to split a Kannada word to get the syllabic clusters using JavaScript.

For example, I want to split the word ಕನ್ನಡ into the syllabic clusters ["ಕ", "ನ್ನ", "ಡ"]. But when I split it with split, the actual array obtained is ["ಕ", "ನ", "್", "ನ", "ಡ"]

Example Fiddle

bugs_cena · Accepted Answer

I cannot say that this is a complete solution. But works to an extent with some basic understanding of how words are formed:

var k = 'ಕನ್ನಡ';
var parts = k.split('');
arr = []; 
for(var i=0; i< parts.length; i++) {
  var s = k.charAt(i); 

  // while the next char is not a swara/vyanjana or previous char was a virama 
  while((i+1) < k.length && k.charCodeAt(i+1) < 0xC85 || k.charCodeAt(i+1) > 0xCB9 || k.charCodeAt(i) == 0xCCD) { 
    s += k.charAt(i+1); 
    i++; 
  } 
  arr.push(s);
}
console.log(arr);

As the comments in the code say, we keep appending chars to previous char as long as they are not swara or vyanjana or previous char was a virama. You might have to work with different words to make sure you cover different cases. This particular case doesn't cover the numbers.

For Character codes you can refer to this link: http://www.unicode.org/charts/PDF/U0C80.pdf

Split Kannada word into syllabic clusters

Tags:

javascript

arrays

split

kannada

mpsbhat

1 Answers

bugs_cena

Recent Activity

Donate For Us

Split Kannada word into syllabic clusters

Tags:

javascript

arrays

split

kannada

mpsbhat

1 Answers

bugs_cena

Related questions

Recent Activity

Donate For Us