Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to simulate python 2 str.lower() in python 3

Tags:

There appears to be a difference between how python 2.7.15 and 3.7.2 perform the lowercase operation.

I have a large dictionary and a large list which were written using python 2, but which I want to use in python 3 (imported from file using pickle). For each item in the list of strings, there is a key in the dict for the python2 lower() case. Unfortunately, they're not the same as the python3 lower() case.

How can I get the answer to what python 2 would have returned to unicode.lower(), while running in python 3?

An example of a string in the list from python 3 is 'İle', the lowercase of which is 'i̇le' (which incidentally, is NOT the ascii 'ile'). This is not in the dictionary. From the pickle, what python 3 reads as "İle" is read into python 2 as u'\u0130le', the lowercase of which is "ile" (the ascii string), which is in the dict. And that's what I need to return.

To clarify, I'm adding an example (where the latter is the ascii string).

python 2.7:

>>> u"\u0130le".lower() == "ile"
>>> True

python 3.7:

>>> u"\u0130le".lower() == "ile"
>>> False
like image 893
jtbr Avatar asked Feb 18 '19 19:02

jtbr


1 Answers

Brute force solution.

Create a lower map in Python2 and then use this in Python3.

Python2 program to create the map:

f = open('py2_lower_map', 'w')

for i in range(256):
    for j in range(256):
        b = chr(j) + chr(i)
        try:
            low = b.decode('utf16').lower()
        except:
            low = str('?')
        f.write(low.encode('utf-8'))

f.close()

Demo of how to use the map in Python3:

f = open('py2_lower_map', 'r', encoding='utf-8')
_py2_lower_map = f.read()
f.close()

def py2_lower(u):
    return ''.join(_py2_lower_map[ord(c)] for c in u)

low = py2_lower('İle')
print(low)
print([ord(c) for c in low])

To be honest, this might have rough corners and is quick&dirty, but mainly do the correct thing. It works on one example ;-)

like image 128
mkiever Avatar answered Oct 11 '22 17:10

mkiever