Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python file input string: how to handle escaped unicode characters?

In a text file (test.txt), my string looks like this:

Gro\u00DFbritannien

Reading it, python escapes the backslash:

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'

How can I have this interpreted as unicode? decode() and unicode() won't do the job.

The following code writes Gro\u00DFbritannien back to the file, but I want it to be Großbritannien

>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)
like image 494
Michi Avatar asked Dec 12 '22 22:12

Michi


1 Answers

You want to use the unicode_escape codec:

>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien

See the docs for the vast number of standard encodings that come as part of the Python standard library.

like image 56
Alex Martelli Avatar answered Mar 09 '23 01:03

Alex Martelli