Converting a hex-string representation to actual bytes in Python

Question

i need to load the third column of this text file as a hex string

http://www.netmite.com/android/mydroid/1.6/external/skia/emoji/gmojiraw.txt

>>> open('gmojiraw.txt').read().split('
')[0].split('	')[2]
'\xF3\xBE\x80\x80'

how do i open the file so that i can get the third column as hex string:

'\xF3\xBE\x80\x80'

i also tried binary mode and hex mode, with no success.

Eli Bendersky · Accepted Answer

You can:

Remove the \x-es
Use .decode('hex') on the resulting string

Code:

>>> '\xF3\xBE\x80\x80'.replace('\x', '').decode('hex')
'\xf3\xbe\x80\x80'

Note the appropriate interpretation of backslashes. When the string representation is '\xf3' it means it's a single-byte string with the byte value 0xF3. When it's '\xf3', which is your input, it means a string consisting of 4 characters: \, x, f and 3

tzot · Answer

Quick'n'dirty reply

your_string.decode('string_escape')

>>> a='\xF3\xBE\x80\x80'
>>> a.decode('string_escape')
'\xf3\xbe\x80\x80'
>>> len(_)
4

Bonus info

>>> u='\uDBB8\uDC03'
>>> u.decode('unicode_escape')

Some trivia

What's interesting, is that I have Python 2.6.4 on Karmic Koala Ubuntu (sys.maxunicode==1114111) and Python 2.6.5 on Gentoo (sys.maxunicode==65535); on Ubuntu, the unicode_escape-decode result is \uDBB8\uDC03 and on Gentoo it's u'\U000fe003', both correctly of length 2. Unless it's something fixed between 2.6.4 and 2.6.5, I'm impressed the 2-byte-per-unicode-character Gentoo version reports the correct character.

Converting a hex-string representation to actual bytes in Python

Tags:

python

hex

representation

kevin

2 Answers

Eli Bendersky

Quick'n'dirty reply

Bonus info

Some trivia

tzot

Recent Activity

Donate For Us

Converting a hex-string representation to actual bytes in Python

Tags:

python

hex

representation

kevin

2 Answers

Eli Bendersky

Quick'n'dirty reply

Bonus info

Some trivia

tzot

Related questions

Recent Activity

Donate For Us