I have a hex-string made from a unicode string with that function:
def toHex(s):
res = ""
for c in s:
res += "%02X" % ord(c) #at least 2 hex digits, can be more
return res
hex_str = toHex(u"...")
This returns a string like this one:
"80547CFB4EBA5DF15B585728"
That's a sequence of 6 chinese symbols.
But
u"Knödel"
converts to
"4B6EF664656C"
What I need now is a function to convert this back to the original unicode. The chinese symbols seem to have a 2-byte representation while the second example has 1-byte representations for all characters. So I can't just use unichr() for each 1- or 2-byte block.
I've already tried
binascii.unhexlify(hex_str)
but this seems to convert byte-by-byte and returns a string, not unicode. I've also tried
binascii.unhexlify(hex_str).decode(...)
with different formats. Never got the original unicode string.
Thank you a lot in advance!
This seems to work just fine:
binascii.unhexlify(binascii.hexlify(u"Knödel".encode('utf-8'))).decode('utf-8')
Comes back to the original object. You can do the same for the chinese text if it's encoded properly, however ord(x)
already destroys the text you started from. You'll need to encode it first and only then treat like a string of bytes.
Can't be done. Using %02X
loses too much information. You should be using something like UTF-8 first and converting that, instead of inventing a broken encoding.
>>> u"Knödel".encode('utf-8').encode('hex')
'4b6ec3b664656c'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With