Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my python code run as expected in the debugger but not otherwise?

I wrote a parser in python3.6; I simplified it as much as possible while still producing the bug:

def tokenize(expr):
    for i in expr:
        try:
            yield int(i)
        except ValueError:
            yield i


def push_on_stream(obj, stream):
    yield obj
    yield from stream


class OpenBracket:
    "just a token value, could have used Ellipsis"
    pass


def parse_toks(tokstream):
    result = []
    leading_brak = False
    for tok in tokstream:
        if tok == OpenBracket:
            leading_brak = True
        elif tok == '(':
            result.append(parse_toks(
                push_on_stream(OpenBracket, tokstream)))
        elif tok == ')':
            if not leading_brak:
                raise SyntaxError("Very bad ')'.")
            break
        else:
            result.append(tok)
    return sum(result)


def test(expr="12(34)21"):
    tokens = tokenize(expr)
    print( parse_toks(tokens) )
    print(list(tokens))

test()

This example is trivial; the effect should be to add all the digits in a string, including digits in brackets.

A tokenize() function yields tokens and a parse_tok() function parses the token stream. If it comes across an open parenthesis, it recurses (pushing OpenBracket onto the token stream), which should have the effect of treating the digits in the parentheses as a separate expression, parsing it and adding the result to the result stack.

When I parser code, e.g. on the expression "1(2)3", it immediately ends after the close bracket, returning 3 and in fact the token stream seems to have ended.

When I run it using pdb however, and set breakpoints inside the loop in parse_tok, I can step carefully when it is processing the ')' and the program correctly returns 6.

I think the bug is something to do with yielding from the token stream in push_on_stream().

Is this a bug in the interpreter? If so is there a good workaround?

I wrote it for python-3.6, but I also tested it on python-3.7 on a different machine with the same result.

like image 665
Silas Coker Avatar asked Mar 05 '23 21:03

Silas Coker


2 Answers

Your push_on_stream doesn't quite work the way you think it should.

See, when the push_on_stream generator is reclaimed, Python calls close on the generator, which throws a GeneratorExit into the generator to make sure any finally blocks and __exit__ methods run. Since push_on_stream uses yield from on the underlying generator, if push_on_stream is suspended in the yield from, this throws a GeneratorExit in the underlying tokenize generator.

This immediately terminates the token stream. In pdb, something caused the push_on_stream generator to not be collected, preventing this effect.

like image 156
user2357112 supports Monica Avatar answered Mar 11 '23 11:03

user2357112 supports Monica


Hypothesis

When the break statement leaves the loop, a GeneratorExit exception is raised which propogates through the generators. pdb modifies how this propagates, which is exactly the sort of subtle bug I'd expect it to introduce, causing it to not exhaust the generator that push_on_stream is yielding from.

Test

If we change push_on_stream from:

def push_on_stream(obj, stream):
    yield obj
    yield from stream

to:

def push_on_stream(obj, stream):
    yield obj
    stream = iter(stream)
    while True:
        yield next(stream)

then this will affect it enough to guarantee the correct behaviour in both cases.

Result

Bug fixed!

Explanation

Provided better by user2357112's answer. Basically, yield from doesn't work the way you'd think it does; when the generator exits due to the break statement, yield from causes the generator you're iterating over to mark itself as exhausted. (pdb interrupts this, because it's a slightly buggy pain.) This leads to your parser terminating at the first ), because the underlying iterator is stopped when the first break statement runs.

like image 25
wizzwizz4 Avatar answered Mar 11 '23 09:03

wizzwizz4