While fetching data from an unknown/old/non-consistent Mysql database to a Postgres utf-8 db using Python (Django) ORM I have sometimes faulty encoded data as a result.
Target: grégory
> a
u'gr\xe3\xa9gory'
> print a
grã©gory
I tried several decode/encode tricks without success:
> print a.encode('utf-8').decode('latin1')
grã©gory
> print a.encode('utf-8').decode('latin1')
grã©gory
> print a.decode('latin-1')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)
Even with some unicode_escape
I guess the string has been incorrectly converted to lowercase at some point, changing \xc3 to \xe3. The lowercase conversion has assumed latin1 encoding when it was actually utf-8.
>>> print 'gr\xc3\xa9gory'.decode('utf8')
grégory
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With