Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - converting wide-char strings from a binary file to Python unicode strings

It's been a long day and I'm a bit stumped.

I'm reading a binary file that contains lots of wide-char strings and I want to dump these out as Python unicode strings. (To unpack the non-string data I'm using the struct module, but I don't how to do the same with the strings.)

For example, reading the word "Series":

myfile = open("test.lei", "rb")
myfile.seek(44)
data = myfile.read(12)

# data is now 'S\x00e\x00r\x00i\x00e\x00s\x00'

How can I encode that raw wide-char data as a Python string?

Edit: I'm using Python 2.6

like image 983
Mikesname Avatar asked Apr 30 '10 17:04

Mikesname


3 Answers

>>> data = 'S\x00e\x00r\x00i\x00e\x00s\x00'
>>> data.decode('utf-16')
u'Series'
like image 149
interjay Avatar answered Nov 15 '22 02:11

interjay


I also recommend to use rstrip with '\x00' after decode - to remove all '\x00' trailing characters, unless, of course, they are not needed.

>>> data = 'S\x00o\x00m\x00e\x00\x20\x00D\x00a\x00t\x00a\x00\x00\x00\x00\x00'
>>> print '"%s"' % data.decode('utf-16').rstrip('\x00')
>>> "Some Data"

Without rstrip('\x00') the result will be with trailing spaces:

>>> "Some Data  "
like image 36
Delimitry Avatar answered Nov 15 '22 04:11

Delimitry


If the string in question is known not to have any characters beyond FF, another possibility that generates a string rather than a unicode object, by eliding the zero-bytes:

>>> 'S\x00e\x00r\x00i\x00e\x00s\x00'[::2]
'Series'
like image 42
kismet Avatar answered Nov 15 '22 02:11

kismet