A generator-returning function (i.e. one with a yield
statement in it) in one of our libraries fails some tests due to an unhandled StopIteration
exception. For convenience, in this post I'll refer to this function as buggy
.
I have not been able to find a way for buggy
to prevent the exception (without affecting the function's normal operation). Similarly, I have not found a way to trap the exception (with a try
/except
) within buggy
.
(Client code using buggy
can trap this exception, but this happens too late, because the code that has the information necessary to properly handle the condition leading to this exception is the buggy
function.)
The actual code and test case I am working with are far too complicated to post here, so I have created a very simple, but also extremely artificial toy example that illustrates the problem.
First, the module with the buggy
function:
# mymod.py
import csv # essential!
def buggy(csvfile):
with open(csvfile) as stream:
reader = csv.reader(stream)
# how to test *here* if either stream is at its end?
for row in reader:
yield row
As indicated by the comment, the use of the csv
module (from the Python 3.x standard library) is an essential feature of this problem1.
The next file for the example is a script that is meant to stand in for "client code". In other word, this script's "real purpose" beyond this example is largely irrelevant. Its role in the example is to provide a simple, reliable way to elicit the problem with the buggy
function. (Some of its code could be repurposed for a test case in a test suite, for example.)
#!/usr/bin/env python3
# myscript.py
import sys
import mymod
def print_row(row):
print(*row, sep='\t')
def main(csvfile, mode=None):
if mode == 'first':
print_row(next(mymod.buggy(csvfile)))
else:
for row in mymod.buggy(csvfile):
print_row(row)
if __name__ == '__main__':
main(*sys.argv[1:])
The script takes the path to a CSV file as a mandatory argument, and an optional second argument. If the second argument is ommitted, or it is anything other than the string "first"
, the script will print to stdout
the information in the CSV file, but in TSV format. If the second argument is the string "first"
, only the information in the first row will be so printed.
The StopIteration
exception I am trying to trap arises when myscript.py
script is invoked with an empty file and the string "first"
as arguments2.
Here is an example of this code in action:
% cat ok_input.csv
1,2,3
4,5,6
7,8,9
% ./myscript.py ok_input.csv
1 2 3
4 5 6
7 8 9
% ./myscript.py ok_input.csv first
1 2 3
% cat empty_input.csv
# no output (of course)
% ./myscript.py empty_input.csv
# no output (as desired)
% ./myscript.py empty_input.csv first
Traceback (most recent call last):
File "./myscript.py", line 19, in <module>
main(*sys.argv[1:])
File "./myscript.py", line 13, in main
print_row(next(mymod.buggy(csvfile)))
StopIteration
Q: How can I prevent or trap this StopIteration
exception in the lexical scope of the buggy
function?
IMPORTANT: Please keep in mind that, in the example given above, the myscript.py
script is stand-in for "client code", and is therefore outside of our control. This means that any approach that would require changing the myscript.py
script would not solve the actual real-world problem, and therefore it would not be an acceptable answer to this question.
One important difference between the simple example shown above and our actual situation is that in our case, the problematic input stream does not come from an empty file. The problem arises in cases where buggy
(or, rather, its real-world counterpart) reaches the end of this stream "too early", so to speak.
I think it may be enough if I could test whether either stream
is at its end, before the for row in reader:
line, but I have not figured a way to do this either. Testing whether the value returned by stream.read(1)
is 0 or 1 will tell me if stream is at its end, but in the latter case stream
's internal pointer will be left pointing one byte too far into csvfile
's content. (Neither stream.seek(-1, 1)
nor stream.tell()
work at this point.)
Lastly, to anyone who would like post an answer to this question: it would be most efficient if you were to take advantage of the example code I have provided above to test your proposal before posting it.
EDIT: One variation of mymod.py
that I tried was this:
import csv # essential!
def buggy(csvfile):
with open(csvfile) as stream:
reader = csv.reader(stream)
try:
firstrow = next(reader)
except StopIteration:
firstrow = None
if firstrow != None:
yield firstrow
for row in reader:
yield row
This variation fails with pretty much the same error message as does the original version.
When I first read @mcernak's proposal, I thought that it was pretty similar to the variation above, and therefore expected it to fail too. Then I was pleasantly surprised to discover that this is not the case! Therefore, as of now, there is one definite candidate to get bounty. That said, I would love to understand why the variation above fails to trap the exception, while @mcernak's succeeds.
1 The actual case I'm dealing with is legacy code; switching from the csv
module to some alternative is not an option for us in the short term.
2 Please, disregard entirely the question of what this demonstration script's "right response should be" when it gets invoked with an empty file and the string "first"
as arguments. The particular combination of inputs that elicits the StopIteration
exception in this post's demonstration does not represent the real-world condition that causes our code to emit the problematic StopIteration
exception. Therefore, the "correct response", whatever that may be, of the demonstration script to the empty file plus "first"
string combination would be irrelevant to the real-world problem I am dealing with.
You can trap the StopIteration
exception in the lexical scope of the buggy
function this way:
import csv # essential!
def buggy(csvfile):
with open(csvfile) as stream:
reader = csv.reader(stream)
try:
yield next(reader)
except StopIteration:
yield 'dummy value'
for row in reader:
yield row
You basically manually request the first value from the reader
iterator and
buggy
functiondummy value
is yielded to prevent the caller of the buggy
function from crashingAfterwards, if the csv file was not empty, the remaining rows will be read (and yielded) in the for cycle.
EDIT: to illustrate why the other variation of mymod.py
mentioned in the question does not work, I've added some print statements to it:
import csv # essential!
def buggy(csvfile):
with open(csvfile) as stream:
reader = csv.reader(stream)
try:
print('reading first row')
firstrow = next(reader)
except StopIteration:
print('no first row exists')
firstrow = None
if firstrow != None:
print('yielding first row: ' + firstrow)
yield firstrow
for row in reader:
print('yielding next row: ' + row)
yield row
print('exiting function open')
Running it gives the following output:
% ./myscript.py empty_input.csv first
reading first row
no first row exists
exiting function open
Traceback (most recent call last):
File "myscript.py", line 15, in <module>
main(*sys.argv[1:])
File "myscript.py", line 9, in main
print_row(next(mymod.buggy(csvfile)))
That shows, that in case that the input file is empty, the first try..except
block correctly handles the StopIteration
exception and that the buggy
function continues on normally.
The exception that the caller of the buggy
gets in this case is due to the fact that the buggy
function does not yield any value before completing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With