In a text file (test.txt), my string looks like this:
Gro\u00DFbritannien
Reading it, python escapes the backslash:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'
How can I have this interpreted as unicode? decode()
and unicode()
won't do the job.
The following code writes Gro\u00DFbritannien
back to the file, but I want it to be Großbritannien
>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)
You want to use the unicode_escape
codec:
>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien
See the docs for the vast number of standard encodings that come as part of the Python standard library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With