Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BNF to Lex to Parser in C [closed]

I am trying to learn the concepts and how to create a lexical analyser and parser in C from BNF notation, not EBNF. I would like to learn it in the language of C.

Can anyone explain to me the parts of the BNF i use to put in the lexical analyser and the parser in C and where to put them please? Such as an example used as well maybe? I have found that in the parser, you put terminals, non terminals, tokens, types, etc...

Sorry if thats unclear or anything, my head is all over the place in this

Thank you

ps. BNF i have

<for_statement> ::= FOR <identifier> 
IS <expression> BY <expression> TO <expression> DO <statement_list> ENDFOR

Lexical analyser code snippit

ENDP                   printf("keyword: ENDP\n");
DECLARATIONS           printf("keyword: DECLARATIONS\n");
CODE                   printf("keyword: CODE\n");
"OF TYPE"              printf("keyword: OF TYPE\n");
like image 627
Tanya Tazzy Hegarty Avatar asked Dec 03 '25 07:12

Tanya Tazzy Hegarty


1 Answers

premise: I think that a much better way to learn about the concepts you speak of could be to put the available tools to work. You could start with the basics, like flex/bison, or -better- some recent free tool, like GOLD.

Today lexers and parsers aren't implemented by hand, due to the complexity of the underlying state machines. But for learning purpose, as advocated for instance by Niklaus Wirth, the 'father' of Pascal, or to 'bootstrap' the SW chain tools, or (in the past) also for efficiency reasons, the lexer/parser was sometime implemented by hand. The lexer usually take the form of many big switch(s), and the lookahead can be naively implemented using ungetc():

#include <stdio.h>
enum token {Num, Sym};

token getnum() {
 char c;
 for ( ; ; )
  switch (c = getc()) {
  case '0':
  ...
  case '9':
   break;
  default:
   ungetc(c);
   return Num;
  }
}
token getsym() {
 char c;
 for ( ; ; )
  switch (c = getc()) {
  case '0':
  ...
  case '9':
   break;
  default:
   ungetc(c);
   return Sym;
 }
}
token lex() {
 char c;
 switch (c = getc()) {
 case '0':
 ...
 case '9':
  return getnum();
 case 'A':
 ...
 case 'Z':
 case 'a':
 ...
 case 'z':
 case '_':
  return getsym();
 }

The simplest parser it's the top-down recursive, that requires your grammar being LL(1). Notably the Pascal parser was of this kind (of course, Wirth designed appropriately the language). We need a function for each nonterminal, and a lookahead of 1 token. The parser becomes a set of mutually recursive procedures (totally fictious pseudocode here):

void A(token t);
void B(token t);

void B(token t)
{
 if (t == Sym)
 {
  t = lex();
  B(t);
 }
 else if (t == Num)
 {
  t = lex();
  A(t);
 }
 else
  syntaxerr();
}
void A(token t)
{
 if (t == Num)
 {
  t = lex();
  B(t);
 }
 else
  syntaxerr();
}

int main(int argc, char **argv)
{
 A(lex());
}

We can always rewrite EBNF in BFN, introducing service non terminals.

like image 179
CapelliC Avatar answered Dec 05 '25 21:12

CapelliC



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!