Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is unicode_literals used for?

I get a weird problem with __future__.unicode_literals in Python. Without importing unicode_literals I get the correct output:

# encoding: utf-8 # from __future__ import unicode_literals name = 'helló wörld from example' print name 

But when I add the unicode_literals import:

# encoding: utf-8 from __future__ import unicode_literals name = 'helló wörld from example' print name 

I got this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 4: ordinal not in range(128) 

Does unicode_literals encode every string as an utf-8? What should I do to override this error?

like image 865
ssj Avatar asked Apr 29 '14 16:04

ssj


People also ask

What is Unicode_literals in Python?

Unicode is also called Universal Character set. ASCII uses 8 bits(1 byte) to represents a character and can have a maximum of 256 (2^8) distinct combinations.

What does from __ future __ import Unicode_literals do?

Your terminal or console is failing to let Python know it supports UTF-8. Without the from __future__ import unicode_literals line, you are building a byte string that holds UTF-8 encoded bytes. With the string you are building a unicode string.

What is the purpose of Unicode in Python?

Python's string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters. Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code.

Does Python use UTF-8?

By default in Python 3, we are on the left side in the world of Unicode code points for strings. We only need to go back and forth with bytes while writing or reading the data. Default encoding during this conversion is UTF-8, but other encodings can also be used.


1 Answers

Your terminal or console is failing to let Python know it supports UTF-8.

Without the from __future__ import unicode_literals line, you are building a byte string that holds UTF-8 encoded bytes. With the string you are building a unicode string.

print has to treat these two values differently; a byte string is written to sys.stdout unchanged. A unicode string is encoded to bytes first, and Python consults sys.stdout.encoding for that. If your system doesn't correctly tell Python what codec it supports, the default is to use ASCII.

Your system failed to tell Python what codec to use; sys.stdout.encoding is set to ASCII, and encoding the unicode value to print failed.

You can verify this by manually encoding to UTF-8 when printing:

# encoding: utf-8 from __future__ import unicode_literals name = 'helló wörld from example' print name.encode('utf8') 

and you can reproduce the issue by creating unicode literals without the from __future__ import statement too:

# encoding: utf-8 name = u'helló wörld from example' print name 

where u'..' is a unicode literal too.

Without details on what your environment is, it is hard to say what the solution is; this depends very much on the OS and console or terminal used.

like image 64
Martijn Pieters Avatar answered Sep 22 '22 06:09

Martijn Pieters