There appears to be a difference between how python 2.7.15 and 3.7.2 perform the lowercase operation.
I have a large dictionary and a large list which were written using python 2, but which I want to use in python 3 (imported from file using pickle). For each item in the list of strings, there is a key in the dict for the python2 lower()
case. Unfortunately, they're not the same as the python3 lower()
case.
How can I get the answer to what python 2 would have returned to unicode.lower()
, while running in python 3?
An example of a string in the list from python 3 is 'İle'
, the lowercase of which is 'i̇le'
(which incidentally, is NOT the ascii 'ile'
). This is not in the dictionary. From the pickle, what python 3 reads as "İle"
is read into python 2 as u'\u0130le'
, the lowercase of which is "ile"
(the ascii string), which is in the dict. And that's what I need to return.
To clarify, I'm adding an example (where the latter is the ascii string).
python 2.7:
>>> u"\u0130le".lower() == "ile"
>>> True
python 3.7:
>>> u"\u0130le".lower() == "ile"
>>> False
Brute force solution.
Create a lower map in Python2 and then use this in Python3.
Python2 program to create the map:
f = open('py2_lower_map', 'w')
for i in range(256):
for j in range(256):
b = chr(j) + chr(i)
try:
low = b.decode('utf16').lower()
except:
low = str('?')
f.write(low.encode('utf-8'))
f.close()
Demo of how to use the map in Python3:
f = open('py2_lower_map', 'r', encoding='utf-8')
_py2_lower_map = f.read()
f.close()
def py2_lower(u):
return ''.join(_py2_lower_map[ord(c)] for c in u)
low = py2_lower('İle')
print(low)
print([ord(c) for c in low])
To be honest, this might have rough corners and is quick&dirty, but mainly do the correct thing. It works on one example ;-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With