In Python source code, Unicode literals are written as strings prefixed with the 'u' or 'U' character: u'abcdefghijk' . Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.
You can use the backslash (\) escape character to add single or double quotation marks in the python string format. The \n escape sequence is used to insert a new line without hitting the enter or return key. The part of the string after \n escape sequence appears in the next line.
Remarks. If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised.
A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.
Just make the second string also a unicode string
>>> s = u'\u2265'
>>> print s
≥
>>> print "{0}".format(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>>
unicode
s need unicode
format strings.
>>> print u'{0}'.format(s)
≥
A bit more information on why that happens.
>>> s = u'\u2265'
>>> print s
works because print
automatically uses the system encoding for your environment, which was likely set to UTF-8. (You can check by doing import sys; print sys.stdout.encoding
)
>>> print "{0}".format(s)
fails because format
tries to match the encoding of the type that it is called on (I couldn't find documentation on this, but this is the behavior I've noticed). Since string literals are byte strings encoded as ASCII in python 2, format
tries to encode s
as ASCII, which then results in that exception. Observe:
>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
So that is basically why these approaches work:
>>> s = u'\u2265'
>>> print u'{}'.format(s)
≥
>>> print '{}'.format(s.encode('utf-8'))
≥
The source character set is defined by the encoding declaration; it is ASCII if no encoding declaration is given in the source file (https://docs.python.org/2/reference/lexical_analysis.html#string-literals)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With