Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the File-Object iterator "broken?"

According to the documentation:

Once an iterator’s __next__() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

However, for file-objects:

>>> f = open('test.txt')
>>> list(f)
['a\n', 'b\n', 'c\n', '\n']
>>> next(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> f.seek(0)
0
>>> next(f)
'a\n'

Are file-object iterators broken? Is this just one of those things that can't be fixed because it would break too much existing code that relies one it?

like image 740
juanpa.arrivillaga Avatar asked Aug 15 '18 16:08

juanpa.arrivillaga


Video Answer


1 Answers

Yes, file iterators are "deemed broken" according to the section of the stdtypes documentation quoted in the question. Both the Python 3 iterator TextIOWrapper and the Python 2 iterator file are broken.

This is something worth keeping in mind if you're using code which assumes iterators are strictly adhering to the iterator protocol. To give one example, using the Python implementation of itertools.dropwhile in combination with a file iterator is buggy. You might encounter issues by iterating a log file whilst another process is still appending lines to the log file.

There was a discussion about this question in the mailing lists. Search the September 2008 archives for Why are "broken iterators" broken? A couple of quotes:

Miles:

Strictly speaking, file objects are broken iterators.

Fredrik Lundh:

It's a design guideline, not an absolute rule.

And Terry Reedy:

It is quite possible that a stream reader will return '' on one call and then something non-empty the next. An iterator that reads a stream and yields chunks of whatever size should either block until it gets sufficient data or yield nulls as long as the stream is open and not raise StopIteration until the steam is closed and it has yielded the last chunk of data.

There is an important different between a store that is closed until the next day and one that closed - out of business. Similarly, there is a difference between an item being out-of-stock until the next delivery and out-of-stock and discontinued permanently, or between a road closed for repairs versus removal for something else. Using the same sign or signal for temporary and permanent conditions is confusing and therefore 'broken'.

I think this behavior is unlikely to change in the language ("Practicality beats purity"), but perhaps the language in the docs will be softened up. There is an existing open issue about that, if you want to follow it: issue23455

like image 181
wim Avatar answered Oct 10 '22 19:10

wim