Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does backtracking affect the language recognized by a parser?

Tags:

c

parsing

Not a school related question, but it comes up in the Dragon Book (Compilers: Principles, Techniques, and Tools) in an exercise:

The grammar:

S ::= aSa | aa

generates all even length strings of a's except for the empty string.

a) Construct a recursive-descent parser with backtracking for this grammar that tries the alternative aSa before aa. Show that the procedure for S succeeds on 2, 4, or 8 a's, but fails on 6 a's. b) What language does your parser recognize?

I'm stumped. It seems like if 4 a's is recognized as S, and two a's between an S is recognized, then 6 a's should be recognized. I tried implementing the parser in C but this one recognizes all even numbers of a's as well. It's not failing to recognize 6 a's. What does this exercise have in mind?

/* A C implementation of Exercise 4.13 in the Dragon Book */

/* The grammar:

   S ::= aSa | aa

*/

/* Construct a recursive-descent parser with backtracking for this grammar 
   that tries the alternative aSa before aa. Show that the procedure for S 
   succeeds on 2, 4, or 8 a's, but fails on 6 a's. 
*/

#include <string.h>
#include <stdio.h>

int S(const char *str, int start, int end);
int aSa(const char *str, int start, int end);
int aa(const char *str, int start, int end);

/* returns 1 if a match, 0 otherwise */
int S(const char *str, int start, int end)
{
  if(aSa(str, start, end))
    return 1;
  else
    if(aa(str, start, end))
      return 1;
  return 0;
}

/* returns 1 if a match, 0 otherwise */
int aSa(const char *str, int start, int end)
{
  int len = end - start;
  if (len < 3)
    return 0;
  if(str[0] != 'a')
    return 0;
  if (!S(str, start+1, end-1))
    return 0;
  if(str[len-1] != 'a')
    return 0;
  return 1;
}

/* returns 1 if a match, 0 otherwise */
int aa(const char *str, int start, int end)
{
  int len = end - start;
  if(len != 2)
    return 0;
  if(str[0] == str[1] && str[0] == 'a')
    return 1;
  return 0;
}

int main()
{
  char str[20];
  printf("Enter a string: \n");
  scanf("%s", str);
  int match = S(str, 0, strlen(str));
  if(match)
    printf("The string %s matches\n", str);
  else
    printf("The string %s does not match\n", str);
  return 0;
}
like image 469
Daniel Dilger Avatar asked Jul 03 '13 19:07

Daniel Dilger


People also ask

What is backtracking in parser?

Backtracking : It means, if one derivation of a production fails, the syntax analyzer restarts the process using different rules of same production. This technique may process the input string more than once to determine the right production.

Which parser uses backtracking?

Backtracking is still useful, but if you need backtracking you can use a dedicated parser for it: OneOf . It's a parser that tries multiple parsers on the same input. If one fails, it backtracks the input to its original state, and if one succeeds, then that output is returned and no other parsers are tried.

How does a language parser work?

A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program instructions and breaks them up into parts that can be used by other components in programming.

How would a parser know if a statement belongs to a given language?

It can be used to check whether or not a string belongs to a given language. When a statement written in a programming language is input, it is parsed by a compiler to check whether or not it is syntactically correct and to extract components if it is correct.


2 Answers

The problem isn't the fact that this is a backtracking or recursive descent parser; the problem is that the described implementation does not properly consider the outer context of the recursive descent parse. This is similar to the difference between a Strong LL (SLL) parser and an LL parser.

The shortest input for which the strange behavior is demonstrated is aaaaaa.

  1. We start in rule S, and match the 1sta.
  2. We invoke S.
    • We match the 2nda.
    • We invoke S. I'll omit the specific steps, but the key is this invocation of S matches aaaa, which is the 3rda through the end of the input. (See note that follows.)
    • We try to match a, but since the end of the input was already reached, we go back and match just the 2nd through 3rdaa.
  3. We match the 4tha.

Additional note about the inner call to S that matched aaaa: If we knew to reserve an a at the end of the input for step 3, then the inner call to S could have matched aa instead of aaaa, leading to a successful parse of the complete input aaaaaa. ANTLR 4 provides this "full context" parsing ability in a recursive descent parser, and is the first recursive descent LL parser able to correctly match aa instead of aaaa for this nested invocation of S.

An SLL parser matches a2k for this grammar. A proper LL parser (such as ANTLR 4) matches a2k for this grammar.

like image 63
Sam Harwell Avatar answered Jan 04 '23 15:01

Sam Harwell


Even with backtracking, which requires being able to rewind the input stream, a recursive descent parser is not allowed to look ahead to the end of the input, nor is it allowed to remove symbols from both ends of the stream.

A left-to-right parser must be able to work with an input stream which has only one method:

get() : consume and read one symbol, or return an EOF symbol.

The backtracking version needs a stream with two more methods:

posn = tell()  : return an opaque value which can be used in seek()
seek(posn)     : reposition the stream to a previous position returned by tell()
like image 38
rici Avatar answered Jan 04 '23 16:01

rici