if you're looping though the chars a unicode string in python (2.x), say:
ak.sɛp.tɑ̃
How can you tell whether the current char is a combining diacritic mark?
For instance, the last char in the above string is actually a combining mark:
ak.sɛp.tɑ̃ --> ̃
Use the unicodedata module:
import unicodedata
if unicodedata.combining(u'a'):
print "is combining character"
else:
print "is not combining"
these posts are also relevant
How do I reverse Unicode decomposition using Python?
What is the best way to remove accents in a Python unicode string?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With