I am trying to compose several programs to handle text analysis in several languages including Latin, Ancient Greek, and Mandarin. Among these one of them is meant to analyze a word in Latin and decompose it into its component syllables then find which syllable is stressed and add an acute accent to the vowel of that syllable. In the case of long vowels such as 'ā' this requires using a combining acute accent '́ ' to produce 'ā́'. But when I place the unicode value for the combining accent ('\u0301') in a string after the character I wish to add it to, it doesn't combine the characters like it should when it prints the string but instead places them next to each other separately. Also, when I try to display non-Western Unicode characters like Japanese Hiragana or Katakana or CJK Unified Ideographs, all I get is that symbol of a question mark in a box you get when a system can't properly display a character. I don't have these issues with combining characters or CJK Unified Ideographs elswhere, as they work just fine in Google Chrome or Microsoft Word for example. I am running Python3 on a 64-bit laptop with Windows 10. Also, how can I handle any of these issues if they come up with Sqlite3?
You can normalize combining accents to the composed form, for example NFC:
>>> from unicodedata import normalize
>>> char = 'a'
>>> accent = '\u0301'
>>> normalize("NFC", char + accent)
'á' # this is a length 1 string
As far as ā́ goes, I think the shortest it can be in Python is length 2 ('\u0101\u0301'), it is up to the terminal emulator to correctly combine the glyphs for the letter and the accents when rendering.
As for the issue you mentioned about Japanese characters not rendering correctly (question mark in a box you get when a system can't properly display a character) this is not a matter of programming or encoding, you just need to install the appropriate glyphs and fonts. On linux I use GNU Unifont, I'm not sure of what to use on Windows 10.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With