Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading and storing arbitrary byte length integers from a file

Tags:

python

numpy

I am attempting to speed up a binary file parser I wrote last year by doing the parsing/data accumulation in numpy. numpy's ability to define customized data structures and slurp data from a binary file into them looks like what I need, except some of the fields in these files are unsigned integers of "nonstandard" length (e.g. 6 bytes). Since I am using Python 2.7, I made my own emulated version of int.from_bytes to handle these fields, but if there is any way to read these fields to integers natively in numpy, that would obviously be much faster and preferable.

like image 974
dpitch40 Avatar asked Jul 16 '12 15:07

dpitch40


1 Answers

Numpy doesn't support arbitrary-bytelength integers, and using ctypes bitfields would be more trouble than it's worth.

I'd suggest using vectorised slicing to convert your data to the next-higher standard size integer:

buf = "000000111111222222"
a = np.ndarray(len(buf), np.dtype('>i1'), buf)
e = np.zeros(len(buf) / 6, np.dtype('>i8'))
for i in range(3):
    e.view(dtype='>i2')[i + 1::4] = a.view(dtype='>i2')[i::3]
[hex(x) for x in e]
like image 108
ecatmur Avatar answered Oct 20 '22 01:10

ecatmur