<p>Python string and unicode objects have following methods for string case conversion.</p> <ul> <li><code>upper()</code></li> <li><code>lower()</code></li> <li><code>title()</code></li> </ul> <p>Using unicode strings, I can handle nearly all characters in my local alphabet:</p> <pre class="prettyprint lang-python prettyprint-override"><code>test_str = u"ças şak ürt örkl" print test_str.upper() >> ÇAS ŞAK ÜRT ÖRKL </code></pre> <p>Except two letters. Since I am living in Turkey, I have typical <code>Turkish I problem</code>. </p> <p>In my local alphabet, we have a letter <code>İ</code> which is similar to <code>I</code> and their case conversion must be like following</p> <h3><code>I → lowercase → ı</code></h3> <h3><code>i → uppercase → İ</code></h3> <p>And yes, it spoils ASCII conversion of <code>i --> I</code> since <code>i</code> and <code>I</code> are two separate letters.</p> <pre class="prettyprint lang-python prettyprint-override"><code>test_str = u"ik" print test_str.upper() >> IK # Wrong! must be İK test_str = u"IK" print test_str.lower() >> ik # Wrong! must be ık </code></pre> <p>How can I overcome this? Is there a way to handle case conversions correctly with using python build-ins?</p>

<p>Python currently doesn't have any support for locale-specific case folding, or the other rules in Unicode SpecialCasing.txt. If you need it today, you can get them from PyICU.</p> <pre class="prettyprint"><code>>>> unicode( icu.UnicodeString(u'IK').toLower(icu.Locale('TR')) ) u'ık' </code></pre> <p>Although if all you care about is the Turkish I, you might prefer to just special-case it.</p>

Changing case of letters in unicode string containing accent and local letters

Tags:

python

string

case-sensitive

unicode

Python string and unicode objects have following methods for string case conversion.

upper()
lower()
title()

Using unicode strings, I can handle nearly all characters in my local alphabet:

test_str = u"ças şak ürt örkl"
print test_str.upper()
>> ÇAS ŞAK ÜRT ÖRKL

Except two letters. Since I am living in Turkey, I have typical Turkish I problem.

In my local alphabet, we have a letter İ which is similar to I and their case conversion must be like following

`I → lowercase → ı`

`i → uppercase → İ`

And yes, it spoils ASCII conversion of i --> I since i and I are two separate letters.

test_str = u"ik"
print test_str.upper()
>> IK  # Wrong! must be İK
test_str = u"IK"
print test_str.lower()
>> ik  # Wrong! must be ık

How can I overcome this? Is there a way to handle case conversions correctly with using python build-ins?

331

asked Mar 05 '14 13:03

FallenAngel

1 Answers

Python currently doesn't have any support for locale-specific case folding, or the other rules in Unicode SpecialCasing.txt. If you need it today, you can get them from PyICU.

>>> unicode( icu.UnicodeString(u'IK').toLower(icu.Locale('TR')) )
u'ık'

Although if all you care about is the Turkish I, you might prefer to just special-case it.

200

answered Sep 18 '22 17:09

bobince

Related questions
                            
                                Using SymPy to generically parse and solve equations
                            
                                Python: change elements position in a list of instances
                            
                                Inconsistent font size for scientific notation in axis
                            
                                typeerror '_csv.reader' object is not subscriptable [duplicate]
                            
                                Django debug toolbar: how do I profile a file download?
                            
                                Assigning variable to private method
                            
                                Python - walk through a huge set of files but in a more efficient manner
                            
                                Factorial calculation using Python
                            
                                python 3.4 AssertionError in pycharm debugger
                            
                                Finding complex roots from set of non-linear equations in python
                            
                                Non-negative least squares for underdetermined system
                            
                                Flask blueprints with gevent working outside of application context
                            
                                Python multiprocessing: max. number of Pool worker processes?
                            
                                Multisort Python
                            
                                Decompile Python 3.3 .pyc using unpyc3
                            
                                Python convert a string to a logic operator
                            
                                Python zipfile module creates multiple files with same name
                            
                                efficient way to find several rows above and below a subset of data
                            
                                resampling pandas series with numeric index
                            
                                How can I parse an arff file without using external libraries in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With