I am trying to learn the concepts and how to create a lexical analyser and parser in C from BNF notation, not EBNF. I would like to learn it in the language of C.
Can anyone explain to me the parts of the BNF i use to put in the lexical analyser and the parser in C and where to put them please? Such as an example used as well maybe? I have found that in the parser, you put terminals, non terminals, tokens, types, etc...
Sorry if thats unclear or anything, my head is all over the place in this
Thank you
ps. BNF i have
<for_statement> ::= FOR <identifier>
IS <expression> BY <expression> TO <expression> DO <statement_list> ENDFOR
Lexical analyser code snippit
ENDP printf("keyword: ENDP\n");
DECLARATIONS printf("keyword: DECLARATIONS\n");
CODE printf("keyword: CODE\n");
"OF TYPE" printf("keyword: OF TYPE\n");
premise: I think that a much better way to learn about the concepts you speak of could be to put the available tools to work. You could start with the basics, like flex/bison, or -better- some recent free tool, like GOLD.
Today lexers and parsers aren't implemented by hand, due to the complexity of the underlying state machines. But for learning purpose, as advocated for instance by Niklaus Wirth, the 'father' of Pascal, or to 'bootstrap' the SW chain tools, or (in the past) also for efficiency reasons, the lexer/parser was sometime implemented by hand. The lexer usually take the form of many big switch(s), and the lookahead can be naively implemented using ungetc():
#include <stdio.h>
enum token {Num, Sym};
token getnum() {
char c;
for ( ; ; )
switch (c = getc()) {
case '0':
...
case '9':
break;
default:
ungetc(c);
return Num;
}
}
token getsym() {
char c;
for ( ; ; )
switch (c = getc()) {
case '0':
...
case '9':
break;
default:
ungetc(c);
return Sym;
}
}
token lex() {
char c;
switch (c = getc()) {
case '0':
...
case '9':
return getnum();
case 'A':
...
case 'Z':
case 'a':
...
case 'z':
case '_':
return getsym();
}
The simplest parser it's the top-down recursive, that requires your grammar being LL(1). Notably the Pascal parser was of this kind (of course, Wirth designed appropriately the language). We need a function for each nonterminal, and a lookahead of 1 token. The parser becomes a set of mutually recursive procedures (totally fictious pseudocode here):
void A(token t);
void B(token t);
void B(token t)
{
if (t == Sym)
{
t = lex();
B(t);
}
else if (t == Num)
{
t = lex();
A(t);
}
else
syntaxerr();
}
void A(token t)
{
if (t == Num)
{
t = lex();
B(t);
}
else
syntaxerr();
}
int main(int argc, char **argv)
{
A(lex());
}
We can always rewrite EBNF in BFN, introducing service non terminals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With