Currently I'm working on a source-to-source compiler and I have already written a bison parser that creates the AST for the input correctly. I need to do several transformations on the syntax tree now and therefore I need to insert many nodes to the tree.
I could create all the structs/unions that I want to add to the syntax tree manually, but this seems to be very much work.
It would be much easier for me to create a string and I want this string to be parsed by the parser I already have. The parser should then return the tree for this string, which I can insert in my original syntax tree.
Unfortunately, the string cannot be parsed with the start rule of my parser, since it must be parsed by a subrule (e.g. my parser parses a list of functions containing statements, the string is a single statement).
How can I make bison parse a string, starting with a rule different from the start rule?
Thanks in advance!
There is a simple hack, which is described in the bison FAQ..
Basically, for each non-terminal which you would like to be able to use, you create one pseudo-token, and then you create a "meta-start" non-terminal which selects the terminal you want to use:
%token START_PROGRAM
%token START_STATEMENT
%token START_EXPRESSION
%start meta_start
%%
meta_start: START_PROGRAM program
| START_STATEMENT statement
| START_EXPRESSION expression
;
(In the action for each of those productions, you would save the value of $2
somewhere that your caller can get at it.)
Now you just need to arrange for your lexer to deliver the correct start token. You can do this by using a pure parser and a pure lexer and delivering the message through a shared datastructure. That would be the best way to do it, but for the purposes of this answer I'll just show how to do it with a global variable, since the principle is the same:
extern int start_token;
// If all your start non-terminals produce a value of the same type, possibly a union
// type, you could arrange to return it instead of the error report.
int yyparse(int start) {
// I left out the code to cause the lexer to scan a given string.
start_token = start;
return real_yyparse();
}
int yylex() {
int rv = start_token;
if (start_token)
start_token = 0;
else
rv = real_yylex();
return rv;
}
Given what you seem to be willing to do, you might find interesting ideas to recycle from this paper.
It provides means to support concrete syntax from C++ code, things like (here, desugaring from the Bison parser itself):
exp: exp "&" exp
{
// Sub−ASTs are used as−is (they are not printed, then parsed,
// as in the previous example). Spaces are no longer critical.
$$ = parse(Tweast() <<
"if" << $1 << "then" << $3 << "<> 0 else 0");
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With