I am trying to convert some words that contains Turkish characters to lowercase.
Reading words from a file which is utf-8 encoded:
with open(filepath,'r', encoding='utf8') as f:
text=f.read().lower()
When I try to convert to lowercase, the Turkish character İ gets corrupted. However when I try to convert to uppercase it works fine.
Here is example code:
str = 'İşbirliği'
print(str)
print(str.lower())
Here is how it looks when it is corrupted:
What's going on here?
Some info that might be useful:
It's not corrupted.
Turkish has both a dotted lowercase i
and a dotless lowercase ı
, and similarly a dotted uppercase İ
and a dotless uppercase I
.
This presents a challenge when converting the dotted uppercase İ
to lowercase: how to retain the information that, if it needs to be converted back to uppercase, it should be converted back to the dotted İ
?
Unicode solves this problem as follows: when İ
is converted to lowercase, it's actually converted to the standard latin i
plus the combining character U+0307 "COMBINING DOT ABOVE". What you're seeing is your terminal's inability to properly render (or, more to the point, refrain from rendering) the combining character, and has nothing to do with Python.
You can see that this is happening using unicodedata.name()
:
>>> import unicodedata
>>> [unicodedata.name(c) for c in 'İ']
['LATIN CAPITAL LETTER I WITH DOT ABOVE']
>>> [unicodedata.name(c) for c in 'İ'.lower()]
['LATIN SMALL LETTER I', 'COMBINING DOT ABOVE']
... although, in a working and correctly configured terminal, it will render without any problems:
>>> 'İ'.lower()
'i̇'
As a side note, if you do convert it back to uppercase, it will remain in the decomposed form:
>>> [unicodedata.name(c) for c in 'İ'.lower().upper()]
['LATIN CAPITAL LETTER I', 'COMBINING DOT ABOVE']
… although you can recombine it with unicodedata.normalize()
:
>>> [unicodedata.name(c) for c in unicodedata.normalize('NFC','İ'.lower().upper())]
['LATIN CAPITAL LETTER I WITH DOT ABOVE']
For more information, see:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With