Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode error Ordinal not in range

Tags:

python

unicode

Odd error with unicode for me. I was dealing with unicode fine, but when I ran it this morning one item u'\u201d' gave error and gives me

UnicodeError: ASCII encoding error: ordinal not in range(128)

I looked up the code and apparently its utf-32 but when I try to decode it in the interpreter:

c = u'\u201d'
c.decode('utf-32', 'replace')

Or any other operation with it for that matter, it just doesnt recognize it in any codec but yet I found it as "RIGHT DOUBLE QUOTATION MARK"

I get:

Traceback (most recent call last):
File "<pyshell#154>", line 1, in <module>
    c.decode('utf-32')
  File "C:\Python27\lib\encodings\utf_32.py", line 11, in decode
    return codecs.utf_32_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 0: ordinal not in range(128)
like image 956
rodling Avatar asked Sep 22 '12 16:09

rodling


People also ask

What does ordinal not in range mean?

This issue happens when Python can't correctly work with a string variable. Strings can contain any sequence of bytes, but when Python is asked to work with the string, it may decide that the string contains invalid bytes.

How do I fix Unicode encode errors in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What causes Unicode error in Python?

The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail.


1 Answers

You already have a unicode string, there is no need to decode it to a unicode string again.

What happens in that case is that python helpfully tries to first encode it for you, so that you can then decode it from utf-32. It uses the default encoding to do so, which happens to be ASCII. Here is an explicit encode to show you the exception raised in that case:

>>> u'\u201d'.encode('ASCII')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 0: ordinal not in range(128)

In short, when you have a unicode literal like u'', there is no need to decode it.

Read up on the unicode, encodings, and default settings in the Python Unicode HOWTO. Another invaluable article on the subject is Joel Spolsky's Minimun Unicode knowledge post.

like image 75
Martijn Pieters Avatar answered Sep 19 '22 12:09

Martijn Pieters