With a text file, I can write this:
with open(path, 'r') as file: for line in file: # handle the line
This is equivalent to this:
with open(path, 'r') as file: for line in iter(file.readline, ''): # handle the line
This idiom is documented in PEP 234 but I have failed to locate a similar idiom for binary files.
With a binary file, I can write this:
with open(path, 'rb') as file: while True: chunk = file.read(1024 * 64) if not chunk: break # handle the chunk
I have tried the same idiom that with a text file:
def make_read(file, size): def read(): return file.read(size) return read with open(path, 'rb') as file: for chunk in iter(make_read(file, 1024 * 64), b''): # handle the chunk
Is it the idiomatic way to iterate over a binary file in Python?
You can choose one of two methods for loading the data. 1) Use the commands open file, read from file and close file. 2) Use the URL keyword with the put command, prefixing the file path with "binfile:". Either approach allows you to place binary data into a variable so that it can be processed.
I don't know of any built-in way to do this, but a wrapper function is easy enough to write:
def read_in_chunks(infile, chunk_size=1024*64): while True: chunk = infile.read(chunk_size) if chunk: yield chunk else: # The chunk was empty, which means we're at the end # of the file return
Then at the interactive prompt:
>>> from chunks import read_in_chunks >>> infile = open('quicklisp.lisp') >>> for chunk in read_in_chunks(infile): ... print chunk ... <contents of quicklisp.lisp in chunks>
Of course, you can easily adapt this to use a with block:
with open('quicklisp.lisp') as infile: for chunk in read_in_chunks(infile): print chunk
And you can eliminate the if statement like this.
def read_in_chunks(infile, chunk_size=1024*64): chunk = infile.read(chunk_size) while chunk: yield chunk chunk = infile.read(chunk_size)
Try:
>>> with open('dups.txt','rb') as f: ... for chunk in iter((lambda:f.read(how_many_bytes_you_want_each_time)),''): ... i+=1
iter
needs a function with zero arguments.
f.read
would read the whole file, since the size
parameter is missing;f.read(1024)
means call a function and pass its return value (data loaded from file) to iter
, so iter
does not get a function at all;(lambda:f.read(1234))
is a function that takes zero arguments (nothing between lambda
and :
) and calls f.read(1234)
.There is equivalence between following:
somefunction = (lambda:f.read(how_many_bytes_you_want_each_time))
and
def somefunction(): return f.read(how_many_bytes_you_want_each_time)
and having one of these before your code you could just write: iter(somefunction, '')
.
Technically you can skip the parentheses around lambda, python's grammar will accept that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With