I would like to print a unicode's character code, and not the actual glyph it represents in Python.
For example, if u
is a list of unicode characters:
>>> u[0]
u'\u0103'
>>> print u[0]
ă
I would like to output the character code as a raw string: u'\u0103'
.
I have tried to just print it to a file, but this doesn't work without encoding it in UTF-8
.
>>> w = open('~/foo.txt', 'w')
>>> print>>w, u[0].decode('utf-8')
Traceback (most recent call last):
File "<pyshell#33>", line 1, in <module>
print>>w, u[0].decode('utf-8')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0103' in position 0: ordinal not in range(128)
>>> print>>w, u[0].encode('utf-8')
>>> w.close()
Encoding it results in the glyph ă
being written to the file.
How can I write the character code?
Use the "\u" escape sequence to print Unicode characters In a string, place "\u" before four hexadecimal digits that represent a Unicode code point. Use print() to print the string.
To create an instance of unicode , you can use the unicode() built-in, or prefix a string literal with a u , like so: my_unicode = u'This is my Unicode string. ' . In Python 3, there is one and only one string type. Its name is str and it's Unicode.
The ord function in python accepts a single character as an argument and returns an integer value representing the Unicode equivalent of that character.
For printing raw unicode data one only need specify the correct encoding:
>>> s = u'\u0103'
>>> print s.encode('raw_unicode_escape')
\u0103
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With