Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode utf-8/utf-16 encoding in Python

In python:

u'\u3053\n'

Is it utf-16?

I'm not really aware of all the unicode/encoding stuff, but this type of thing is coming up in my dataset, like if I have a=u'\u3053\n'.

print gives an exception and decoding gives an exception.

a.encode("utf-16") > '\xff\xfeS0\n\x00'
a.encode("utf-8") > '\xe3\x81\x93\n'

print a.encode("utf-8") > πüô
print a.encode("utf-16") >  ■S0

What's going on here?

like image 218
8steve8 Avatar asked Aug 04 '09 19:08

8steve8


1 Answers

It's a unicode character that doesn't seem to be displayable in your terminals encoding. print tries to encode the unicode object in the encoding of your terminal and if this can't be done you get an exception.

On a terminal that can display utf-8 you get:

>>> print u'\u3053'
こ

Your terminal doesn't seem to be able to display utf-8, else at least the print a.encode("utf-8") line should produce the correct character.

like image 135
sth Avatar answered Oct 12 '22 23:10

sth