Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the idiomatic way to iterate over a binary file?

With a text file, I can write this:

with open(path, 'r') as file:     for line in file:         # handle the line 

This is equivalent to this:

with open(path, 'r') as file:     for line in iter(file.readline, ''):         # handle the line 

This idiom is documented in PEP 234 but I have failed to locate a similar idiom for binary files.

With a binary file, I can write this:

with open(path, 'rb') as file:     while True:         chunk = file.read(1024 * 64)         if not chunk:             break         # handle the chunk 

I have tried the same idiom that with a text file:

def make_read(file, size):     def read():         return file.read(size)     return read  with open(path, 'rb') as file:     for chunk in iter(make_read(file, 1024 * 64), b''):         # handle the chunk 

Is it the idiomatic way to iterate over a binary file in Python?

like image 845
dawg Avatar asked Dec 30 '10 21:12

dawg


People also ask

How do you handle binary files?

You can choose one of two methods for loading the data. 1) Use the commands open file, read from file and close file. 2) Use the URL keyword with the put command, prefixing the file path with "binfile:". Either approach allows you to place binary data into a variable so that it can be processed.


2 Answers

I don't know of any built-in way to do this, but a wrapper function is easy enough to write:

def read_in_chunks(infile, chunk_size=1024*64):     while True:         chunk = infile.read(chunk_size)         if chunk:             yield chunk         else:             # The chunk was empty, which means we're at the end             # of the file             return 

Then at the interactive prompt:

>>> from chunks import read_in_chunks >>> infile = open('quicklisp.lisp') >>> for chunk in read_in_chunks(infile): ...     print chunk ...  <contents of quicklisp.lisp in chunks> 

Of course, you can easily adapt this to use a with block:

with open('quicklisp.lisp') as infile:     for chunk in read_in_chunks(infile):         print chunk 

And you can eliminate the if statement like this.

def read_in_chunks(infile, chunk_size=1024*64):     chunk = infile.read(chunk_size)     while chunk:         yield chunk         chunk = infile.read(chunk_size) 
like image 43
Jason Baker Avatar answered Oct 02 '22 06:10

Jason Baker


Try:

>>> with open('dups.txt','rb') as f: ...    for chunk in iter((lambda:f.read(how_many_bytes_you_want_each_time)),''): ...       i+=1 

iter needs a function with zero arguments.

  • a plain f.read would read the whole file, since the size parameter is missing;
  • f.read(1024) means call a function and pass its return value (data loaded from file) to iter, so iter does not get a function at all;
  • (lambda:f.read(1234)) is a function that takes zero arguments (nothing between lambda and :) and calls f.read(1234).

There is equivalence between following:

somefunction = (lambda:f.read(how_many_bytes_you_want_each_time)) 

and

def somefunction(): return f.read(how_many_bytes_you_want_each_time) 

and having one of these before your code you could just write: iter(somefunction, '').

Technically you can skip the parentheses around lambda, python's grammar will accept that.

like image 81
liori Avatar answered Oct 02 '22 07:10

liori