Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make bison start parsing with a rule other than the start rule

Tags:

bison

Currently I'm working on a source-to-source compiler and I have already written a bison parser that creates the AST for the input correctly. I need to do several transformations on the syntax tree now and therefore I need to insert many nodes to the tree.

I could create all the structs/unions that I want to add to the syntax tree manually, but this seems to be very much work.

It would be much easier for me to create a string and I want this string to be parsed by the parser I already have. The parser should then return the tree for this string, which I can insert in my original syntax tree.

Unfortunately, the string cannot be parsed with the start rule of my parser, since it must be parsed by a subrule (e.g. my parser parses a list of functions containing statements, the string is a single statement).

How can I make bison parse a string, starting with a rule different from the start rule?

Thanks in advance!

like image 760
pkreutzer Avatar asked Dec 25 '22 21:12

pkreutzer


2 Answers

There is a simple hack, which is described in the bison FAQ..

Basically, for each non-terminal which you would like to be able to use, you create one pseudo-token, and then you create a "meta-start" non-terminal which selects the terminal you want to use:

%token START_PROGRAM
%token START_STATEMENT
%token START_EXPRESSION

%start meta_start

%%

meta_start: START_PROGRAM program
          | START_STATEMENT statement
          | START_EXPRESSION expression
          ;

(In the action for each of those productions, you would save the value of $2 somewhere that your caller can get at it.)

Now you just need to arrange for your lexer to deliver the correct start token. You can do this by using a pure parser and a pure lexer and delivering the message through a shared datastructure. That would be the best way to do it, but for the purposes of this answer I'll just show how to do it with a global variable, since the principle is the same:

extern int start_token;

// If all your start non-terminals produce a value of the same type, possibly a union
// type, you could arrange to return it instead of the error report.

int yyparse(int start) {
  // I left out the code to cause the lexer to scan a given string.
  start_token = start;
  return real_yyparse();
}

int yylex() {
  int rv = start_token;
  if (start_token)
    start_token = 0;
  else
    rv = real_yylex();
  return rv;
}
like image 136
rici Avatar answered Feb 15 '23 11:02

rici


Given what you seem to be willing to do, you might find interesting ideas to recycle from this paper.

It provides means to support concrete syntax from C++ code, things like (here, desugaring from the Bison parser itself):

exp: exp "&" exp
{
  // Sub−ASTs are used as−is (they are not printed, then parsed,
  // as in the previous example). Spaces are no longer critical.
  $$ = parse(Tweast() <<
             "if" << $1 << "then" << $3 << "<> 0 else 0");
};
like image 31
akim Avatar answered Feb 15 '23 11:02

akim