Not a school related question, but it comes up in the Dragon Book (Compilers: Principles, Techniques, and Tools) in an exercise:
The grammar:
S ::= aSa | aa
generates all even length strings of a's except for the empty string.
a) Construct a recursive-descent parser with backtracking for this grammar that tries the alternative aSa before aa. Show that the procedure for S succeeds on 2, 4, or 8 a's, but fails on 6 a's. b) What language does your parser recognize?
I'm stumped. It seems like if 4 a's is recognized as S, and two a's between an S is recognized, then 6 a's should be recognized. I tried implementing the parser in C but this one recognizes all even numbers of a's as well. It's not failing to recognize 6 a's. What does this exercise have in mind?
/* A C implementation of Exercise 4.13 in the Dragon Book */
/* The grammar:
S ::= aSa | aa
*/
/* Construct a recursive-descent parser with backtracking for this grammar
that tries the alternative aSa before aa. Show that the procedure for S
succeeds on 2, 4, or 8 a's, but fails on 6 a's.
*/
#include <string.h>
#include <stdio.h>
int S(const char *str, int start, int end);
int aSa(const char *str, int start, int end);
int aa(const char *str, int start, int end);
/* returns 1 if a match, 0 otherwise */
int S(const char *str, int start, int end)
{
if(aSa(str, start, end))
return 1;
else
if(aa(str, start, end))
return 1;
return 0;
}
/* returns 1 if a match, 0 otherwise */
int aSa(const char *str, int start, int end)
{
int len = end - start;
if (len < 3)
return 0;
if(str[0] != 'a')
return 0;
if (!S(str, start+1, end-1))
return 0;
if(str[len-1] != 'a')
return 0;
return 1;
}
/* returns 1 if a match, 0 otherwise */
int aa(const char *str, int start, int end)
{
int len = end - start;
if(len != 2)
return 0;
if(str[0] == str[1] && str[0] == 'a')
return 1;
return 0;
}
int main()
{
char str[20];
printf("Enter a string: \n");
scanf("%s", str);
int match = S(str, 0, strlen(str));
if(match)
printf("The string %s matches\n", str);
else
printf("The string %s does not match\n", str);
return 0;
}
Backtracking : It means, if one derivation of a production fails, the syntax analyzer restarts the process using different rules of same production. This technique may process the input string more than once to determine the right production.
Backtracking is still useful, but if you need backtracking you can use a dedicated parser for it: OneOf . It's a parser that tries multiple parsers on the same input. If one fails, it backtracks the input to its original state, and if one succeeds, then that output is returned and no other parsers are tried.
A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program instructions and breaks them up into parts that can be used by other components in programming.
It can be used to check whether or not a string belongs to a given language. When a statement written in a programming language is input, it is parsed by a compiler to check whether or not it is syntactically correct and to extract components if it is correct.
The problem isn't the fact that this is a backtracking or recursive descent parser; the problem is that the described implementation does not properly consider the outer context of the recursive descent parse. This is similar to the difference between a Strong LL (SLL) parser and an LL parser.
The shortest input for which the strange behavior is demonstrated is aaaaaa
.
S
, and match the 1sta
.S
.
a
.S
. I'll omit the specific steps, but the key is this invocation of S
matches aaaa
, which is the 3rda
through the end of the input. (See note that follows.)a
, but since the end of the input was already reached, we go back and match just the 2nd through 3rdaa
.a
.Additional note about the inner call to S
that matched aaaa
: If we knew to reserve an a
at the end of the input for step 3, then the inner call to S
could have matched aa
instead of aaaa
, leading to a successful parse of the complete input aaaaaa
. ANTLR 4 provides this "full context" parsing ability in a recursive descent parser, and is the first recursive descent LL parser able to correctly match aa
instead of aaaa
for this nested invocation of S
.
An SLL parser matches a2k for this grammar. A proper LL parser (such as ANTLR 4) matches a2k for this grammar.
Even with backtracking, which requires being able to rewind the input stream, a recursive descent parser is not allowed to look ahead to the end of the input, nor is it allowed to remove symbols from both ends of the stream.
A left-to-right parser must be able to work with an input stream which has only one method:
get() : consume and read one symbol, or return an EOF symbol.
The backtracking version needs a stream with two more methods:
posn = tell() : return an opaque value which can be used in seek()
seek(posn) : reposition the stream to a previous position returned by tell()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With