I'm using array.arrayto store many fixed-size numerical records in binary format to a large file, which I would like to process in parallel in chunks, by writing e.g array.array('l', range(20)).tofile(fout). How can I calculate the offsets to use with seek to ensure that I chunk at record boundaries?
Let's take an array object:
>>> import array
>>> a = array.array('l', range(20))
The size of each element, in bytes:
>>> a.itemsize
4
Write it out:
>>> f = open('array.dat', "wb")
>>> a.tofile(f)
>>> f.close()
Sanity check:
>>> import os
>>> os.stat('array.dat').st_size
80L
>>> len(a) * a.itemsize
80
So the file has the expected number of bytes. Read up the value at "index", say, 7:
>>> f = open('array.dat', 'rb')
>>> f.seek(7 * a.itemsize)
>>> raw = f.read(a.itemsize)
>>> import struct
>>> struct.unpack(a.typecode, raw)
(7,)
Clear?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With