What is the difference between a top down and bottom up grammar? An example would be awesome.
First of all, the grammar itself isn't top-down or bottom-up, the parser is (though there are grammars that can be parsed by one but not the other).
From a practical viewpoint, the main difference is that most hand-written parsers are top-down, while a much larger percentage of machine-generated parsers are bottom-up (though, of course, the reverse is certainly possible).
A top-down parser typically uses recursive descent, which typically means a structure something like this (using typical mathematical expressions as an example):
expression() { term() [-+] expression }
term() { factor() [*/] term() }
factor() { operand() | '(' expression() ')' }
A bottom-up parser work in the reverse direction -- where a recursive descent parser starts from the full expression, and breaks it down into smaller and smaller pieces until it reaches the level of individual tokens, a bottom-up parser starts from the individual tokens, and uses tables of rules about how those tokens fit together into higher and higher levels of the expression hierarchy until it reaches the top level (what's represented as "expression" above).
Edit: To clarify, perhaps it would make sense to add a really trivial parser. In this case, I'll just do the old classic of converting a simplified version of a typical mathematical expression from infix to postfix:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void expression(void);
void show(int ch) {
putchar(ch);
putchar(' ');
}
int token() {
int ch;
while (isspace(ch=getchar()))
;
return ch;
}
void factor() {
int ch = token();
if (ch == '(') {
expression();
ch = token();
if (ch != ')') {
fprintf(stderr, "Syntax error. Expected close paren, found: %c\n", ch);
exit(EXIT_FAILURE);
}
}
else
show(ch);
}
void term() {
int ch;
factor();
ch = token();
if (ch == '*' || ch == '/') {
term();
show(ch);
}
else
ungetc(ch, stdin);
}
void expression() {
int ch;
term();
ch = token();
if (ch == '-' || ch=='+') {
expression();
show(ch);
}
else
ungetc(ch, stdin);
}
int main(int argc, char **argv) {
expression();
return 0;
}
Note that the lexing here is pretty stupid (it basically just accepts a single character as a token) and the expressions allowed are quite limited (only +-*/). OTOH, it's good enough to handle an input like:
1+2*(3+4*(5/6))
from which it does produce what I believe is correct output:
1 2 3 4 5 6 / * + * +
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With