When we believe Python 3 got everything right on Unicode I am surprised while I faced this situation.
>>> amma = "அம்மா"
>>> amma
'அம்மா'
>>> len(amma)
5
Apparently the Tamil string "அம்மா"
has 3 letters, A return value of 5 for len("அம்மா")
in no way can be accepted or appreciated.
How are the other Dravidian or Brahmic scripts solve this issue to get the right string length?
Edit #1: Considering the comment of @joey this question can be rephrased as below.
How to calculate the grapheme length in Python?
We know Swift or Perl6 does this by default
2> let amma = "அம்மா".characters.count
amma: Distance = 3
It may have 3 letters, but it has 5 characters:
$ charinfo 'அம்மா'
U+0B85 TAMIL LETTER A [Lo]
U+0BAE TAMIL LETTER MA [Lo]
U+0BCD TAMIL SIGN VIRAMA [Mn]
U+0BAE TAMIL LETTER MA [Lo]
U+0BBE TAMIL VOWEL SIGN AA [Mc]
If you need to be more specific then you will need to only count the number of characters that are in the Letter category.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With