Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lemon parser parsing 0 token

Tags:

c

parsing

lemon

I'm having a problem using (reentrant) Flex + Lemon for parsing. I'm using a simple grammar and lexer here. When I run it, I'll put in a number followed by an EOF token (Ctrl-D). The printout will read:

89

found int of .
AST=0.

Where the first line is the number I put in. Theoretically, the AST value should be the sum of everything I put in.

EDIT: when I call Parse() manually it runs correctly.

Also, lemon appears to run the atom ::= INT rule even when the token is 0 (the stop token). Why is this? I'm very confused about this behavior and I can't find any good documentation.

like image 686
semisight Avatar asked Dec 20 '13 22:12

semisight


1 Answers

Okay, I figured it out. The reason is that there is a particularly nasty (and poorly documented) interaction going on between flex and lemon.

In an attempt to save memory, lemon will hold onto a token without copying, and push it on to an internal token stack. However, flex also tries to save memory by changing the value that yyget_text points to as it lexes the input. The offending line in my example is:

// in the do loop of main.c...
Parse(parser, token, yyget_text(lexer));

This should be:

Parse(parser, token, strdup(yyget_text(lexer)));

which will ensure that the value that lemon points to when it reduces the token stack later is the same as what you originally passed in.

(Note: Don't forget, strdup means you'll have to free that memory at some point later. Lemon will let you write token "destructors" that can do this, or if you're building an AST tree you should wait until the end of the AST lifetime.)

like image 137
semisight Avatar answered Oct 12 '22 09:10

semisight