I am attempting to speed up a binary file parser I wrote last year by doing the parsing/data accumulation in numpy. numpy's ability to define customized data structures and slurp data from a binary file into them looks like what I need, except some of the fields in these files are unsigned integers of "nonstandard" length (e.g. 6 bytes). Since I am using Python 2.7, I made my own emulated version of int.from_bytes to handle these fields, but if there is any way to read these fields to integers natively in numpy, that would obviously be much faster and preferable.
Numpy doesn't support arbitrary-bytelength integers, and using ctypes bitfields would be more trouble than it's worth.
I'd suggest using vectorised slicing to convert your data to the next-higher standard size integer:
buf = "000000111111222222"
a = np.ndarray(len(buf), np.dtype('>i1'), buf)
e = np.zeros(len(buf) / 6, np.dtype('>i8'))
for i in range(3):
e.view(dtype='>i2')[i + 1::4] = a.view(dtype='>i2')[i::3]
[hex(x) for x in e]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With