Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

I'm having a few issues trying to encode a string to UTF-8. I've tried numerous things, including using string.encode('utf-8') and unicode(string), but I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

This is my string:

(。・ω・。)ノ 

I don't see what's going wrong, any idea?

Edit: The problem is that printing the string as it is does not show properly. Also, this error when I try to convert it:

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89' >>> s1 = s.decode('utf-8') >>> print s1 Traceback (most recent call last):   File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128) 
like image 424
Markum Avatar asked May 12 '12 07:05

Markum


People also ask

What does UnicodeDecodeError mean?

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.


2 Answers

This is to do with the encoding of your terminal not being set to UTF-8. Here is my terminal

$ echo $LANG en_GB.UTF-8 $ python Python 2.7.3 (default, Apr 20 2012, 22:39:59)  [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89' >>> s1 = s.decode('utf-8') >>> print s1 (。・ω・。)ノ >>>  

On my terminal the example works with the above, but if I get rid of the LANG setting then it won't work

$ unset LANG $ python Python 2.7.3 (default, Apr 20 2012, 22:39:59)  [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89' >>> s1 = s.decode('utf-8') >>> print s1 Traceback (most recent call last):   File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128) >>>  

Consult the docs for your linux variant to discover how to make this change permanent.

like image 180
Nick Craig-Wood Avatar answered Oct 03 '22 04:10

Nick Craig-Wood


try:

string.decode('utf-8')  # or: unicode(string, 'utf-8') 

edit:

'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'.decode('utf-8') gives u'(\uff61\uff65\u03c9\uff65\uff61)\uff89', which is correct.

so your problem must be at some oter place, possibly if you try to do something with it were there is an implicit conversion going on (could be printing, writing to a stream...)

to say more we'll need to see some code.

like image 41
mata Avatar answered Oct 03 '22 05:10

mata