Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert unicode codepoint to UTF8 hex in python

Tags:

python

unicode

I want to convert a number of unicode codepoints read from a file to their UTF8 encoding.

e.g I want to convert the string 'FD9B' to the string 'EFB69B'.

I can do this manually using string literals like this:

u'\uFD9B'.encode('utf-8')

but I cannot work out how to do it programatically.

like image 742
Richard Avatar asked May 15 '09 10:05

Richard


2 Answers

Use the built-in function chr() to convert the number to character, then encode that:

>>> chr(int('fd9b', 16)).encode('utf-8')
'\xef\xb6\x9b'

This is the string itself. If you want the string as ASCII hex, you'd need to walk through and convert each character c to hex, using hex(ord(c)) or similar.

Note: If you are still stuck with Python 2, you can use unichr() instead.

like image 89
unwind Avatar answered Oct 01 '22 21:10

unwind


here's a complete solution:

>>> ''.join(['{0:x}'.format(ord(x)) for x in unichr(int('FD9B', 16)).encode('utf-8')]).upper()
'EFB69B'
like image 4
simon Avatar answered Oct 01 '22 22:10

simon