I want to convert a number of unicode codepoints read from a file to their UTF8 encoding.
e.g I want to convert the string 'FD9B'
to the string 'EFB69B'
.
I can do this manually using string literals like this:
u'\uFD9B'.encode('utf-8')
but I cannot work out how to do it programatically.
Use the built-in function chr()
to convert the number to character, then encode that:
>>> chr(int('fd9b', 16)).encode('utf-8')
'\xef\xb6\x9b'
This is the string itself. If you want the string as ASCII hex, you'd need to walk through and convert each character c
to hex, using hex(ord(c))
or similar.
Note: If you are still stuck with Python 2, you can use unichr()
instead.
here's a complete solution:
>>> ''.join(['{0:x}'.format(ord(x)) for x in unichr(int('FD9B', 16)).encode('utf-8')]).upper()
'EFB69B'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With