Convert unicode codepoint to UTF8 hex in python

Question

I want to convert a number of unicode codepoints read from a file to their UTF8 encoding.

e.g I want to convert the string 'FD9B' to the string 'EFB69B'.

I can do this manually using string literals like this:

u'\uFD9B'.encode('utf-8')

but I cannot work out how to do it programatically.

unwind · Accepted Answer

Use the built-in function chr() to convert the number to character, then encode that:

>>> chr(int('fd9b', 16)).encode('utf-8')
'\xef\xb6\x9b'

This is the string itself. If you want the string as ASCII hex, you'd need to walk through and convert each character c to hex, using hex(ord(c)) or similar.

Note: If you are still stuck with Python 2, you can use unichr() instead.

simon · Answer

here's a complete solution:

>>> ''.join(['{0:x}'.format(ord(x)) for x in unichr(int('FD9B', 16)).encode('utf-8')]).upper()
'EFB69B'

Convert unicode codepoint to UTF8 hex in python

Tags:

python

unicode

Richard

2 Answers

unwind

simon

Recent Activity

Donate For Us

Convert unicode codepoint to UTF8 hex in python

Tags:

python

unicode

Richard

2 Answers

unwind

simon

Related questions

Recent Activity

Donate For Us