Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python UTF-8 Lowercase Turkish Specific Letter

with using python 2.7:

>myCity = 'Isparta'
>myCity.lower()
>'isparta'
#-should be-
>'ısparta'

tried some decoding, (like, myCity.decode("utf-8").lower()) but could not find how to do it.

how can lower this kinds of letters? ('I' > 'ı', 'İ' > 'i' etc)

EDIT: In Turkish, lower case of 'I' is 'ı'. Upper case of 'i' is 'İ'

like image 826
ayyayyekokojambo Avatar asked Sep 26 '13 14:09

ayyayyekokojambo


People also ask

Are Turkish characters UTF-8?

UTF8 does not work for turkish characters.

What is ascii encoding python?

Python ascii() Method. ASCII stands for American Standard Code for Information Interchange. It is a character encoding standard that uses numbers from 0 to 127 to represent English characters. For example, ASCII code for the character A is 65, and 90 is for Z. Similarly, ASCII code 97 is for a, and 122 is for z.

What is encoding UTF-8 in Python?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.)

Does Python use Ascii or Unicode?

1. Python 2 uses str type to store bytes and unicode type to store unicode code points. All strings by default are str type — which is bytes~ And Default encoding is ASCII.


2 Answers

Some have suggested using the tr_TR.utf8 locale. At least on Ubuntu, perhaps related to this bug, setting this locale does not produce the desired result:

import locale
locale.setlocale(locale.LC_ALL, 'tr_TR.utf8')

myCity = u'Isparta İsparta'
print(myCity.lower())
# isparta isparta

So if this bug affects you, as a workaround you could perform this translation yourself:

lower_map = {
    ord(u'I'): u'ı',
    ord(u'İ'): u'i',
    }

myCity = u'Isparta İsparta'
lowerCity = myCity.translate(lower_map)
print(lowerCity)
# ısparta isparta

prints

ısparta isparta
like image 194
unutbu Avatar answered Oct 06 '22 05:10

unutbu


You should use new derived class from unicode from emre's solution

class unicode_tr(unicode):
    CHARMAP = {
        "to_upper": {
            u"ı": u"I",
            u"i": u"İ",
        },
        "to_lower": {
            u"I": u"ı",
            u"İ": u"i",
        }
    }

    def lower(self):
        for key, value in self.CHARMAP.get("to_lower").items():
            self = self.replace(key, value)
        return self.lower()

    def upper(self):
        for key, value in self.CHARMAP.get("to_upper").items():
            self = self.replace(key, value)
        return self.upper()

if __name__ == '__main__':
    print unicode_tr("kitap").upper()
    print unicode_tr("KİTAP").lower()

Gives

KİTAP
kitap

This must solve your problem.

like image 25
guneysus Avatar answered Oct 06 '22 05:10

guneysus