Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python unpack little endian

Tags:

python

I'm trying to use Python read a binary file. The file is in LSB mode. I import the struct module and use unpack like this:

f=open(sys.argv[1],'rb')
contents= unpack('<I',f.read(4))[0]
print contents
f.close()

The data in the file is 0XC0000500 in LSB mode, and the actual value is 0X000500C0. So you can see the LSB mode's smallest size is per byte.

However, I use a Mac machine, perhaps because of the version of my gcc or machine (I am not for sure. I just read the http://docs.python.org/library/struct.html about the sizeof and sys.bitorder), the result from the above code is X0500C000, so the size of the LSB mode is 2Bytes.

How should I solve this problem?

I will keep digging no matter this question is answered or not, and I will update if I ever get something.

ps: The data file is an ELF file for a 32-bit machine.

pps: Since I am going to read a huge amount of data, and this is a general problem in the reading, so the manual way is not the best for me. Question is still open for answers.

ppps: < means "little-endian,standard size (16 bit)" Now I read this...

like image 480
user1595754 Avatar asked Aug 28 '12 16:08

user1595754


1 Answers

if the actual value is OXABCD, then the file stores DCBA.

Usually byte order defines order of bytes, not individual bits inside a byte. "\xDC\xBA" are two bytes (16 bits). If you swap the bytes; all possible results are:

>>> "0X%04X" % struct.unpack("<H", binascii.unhexlify("DCBA"))
'0XBADC'
>>> "0X%04X" % struct.unpack(">H", binascii.unhexlify("DCBA"))
'0XDCBA'

Here's how 0xabcd looks like in little/big-endian format:

>>> struct.pack('<H', 0xabcd)
'\xcd\xab'
>>> struct.pack('>H', 0xabcd)
'\xab\xcd'

To get 0XABCD from "\xDC\xBA" you need swap half-bytes (4-bits). It seems unusual.

Since I am going to read a huge amount of data

You could use array module to read multiple values at once. It uses the same type format as the struct module.

< means "little-endian,standard size (16 bit)"

If you use <> with the struct module then standard sizes are fixed and independent of anything. Standard size depends only on the format character. In particular '<H' is always 2 bytes (16 bits), '<I' is always 4 bytes (32 bits). Only @ prefix uses native sizes.

Old answer

leave it here for the comments to make sense

You could read it as 2 bytes values and convert to int manually:

>>> hi, lo = struct.unpack("<HH", "\x05\x00\xC0\x00")
>>> n = (hi << 16) | lo
>>> n
327872
>>> "0X%08X" % n
'0X000500C0'
like image 109
jfs Avatar answered Nov 06 '22 07:11

jfs