We are wondering if there is any method to split a Kannada word to get the syllabic clusters using JavaScript.
For example, I want to split the word ಕನ್ನಡ
into the syllabic clusters ["ಕ", "ನ್ನ", "ಡ"]
. But when I split it with split
, the actual array obtained is ["ಕ", "ನ", "್", "ನ", "ಡ"]
Example Fiddle
I cannot say that this is a complete solution. But works to an extent with some basic understanding of how words are formed:
var k = 'ಕನ್ನಡ';
var parts = k.split('');
arr = [];
for(var i=0; i< parts.length; i++) {
var s = k.charAt(i);
// while the next char is not a swara/vyanjana or previous char was a virama
while((i+1) < k.length && k.charCodeAt(i+1) < 0xC85 || k.charCodeAt(i+1) > 0xCB9 || k.charCodeAt(i) == 0xCCD) {
s += k.charAt(i+1);
i++;
}
arr.push(s);
}
console.log(arr);
As the comments in the code say, we keep appending chars to previous char as long as they are not swara
or vyanjana
or previous char was a virama
. You might have to work with different words to make sure you cover different cases. This particular case doesn't cover the numbers.
For Character codes you can refer to this link: http://www.unicode.org/charts/PDF/U0C80.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With