Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Context Sensitive Language

i am reading the Definitive ANTLR reference by Terence Parr, where he says:

Semantic predicates are a powerful means of recognizing context-sensitive language structures by allowing runtime information to drive recognition

But the examples in the book are very simple. What i need to know is: can ANTLR parse context-sensitive rules like:

xAy --> xBy

If ANTLR can't parse these rules, is there is another tool that deals with context-sensitive grammars?

like image 313
Radi Avatar asked Feb 26 '11 12:02

Radi


People also ask

What is meant by context-sensitive language?

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar (and equivalently by a noncontracting grammar). Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.

Which language is context-sensitive language?

Context-sensitive Language: The language that can be defined by context-sensitive grammar is called CSL. Properties of CSL are : Union, intersection and concatenation of two context-sensitive languages is context-sensitive.

Which machine recognize context sensitive languages?

Context-sensitive grammars are one such class. These grammars generate languages that can be recognized with a restricted class of Turing machines called linear-bounded automata. A grammar G = (V, T, S, P) is context-sensitive if all productions are of the form , where and .


1 Answers

ANTLR parses only grammars which are LL(*). It can't parse using grammars for full context-sensitive languages such as the example you provided. I think what Parr meant was that ANTLR can parse some languages that require some (left) context constraints.

In particular, one can use semantic predicates on "reduction actions" (we do this for GLR parsers used by our DMS Software Reengineering Toolkit but the idea is similar for ANTLR, I think) to inspect any data collected by the parser so far, either as ad hoc side effects of other semantic actions, or in a partially-built parse tree.

For our DMS-based DMS-based Fortran front end, there's a context-sensitive check to ensure that DO-loops are properly lined up. Consider:

 DO  20, I= ...
   DO 10, J = ...
       ...
20  CONTINUE
10  CONTINUE

From the point of view of the parser, the lexical stream looks like this:

DO  <number> , <variable> =  ...
    DO <number> , <variable> = ...
         ...
<number> CONTINUE
<number> CONTINUE

How can the parser then know which DO statement goes with which CONTINUE statement? (saying that each DO matches its closest CONTINUE won't work, because FORTRAN can share a CONTINUE statement with multiple DO-heads).

We use a semantic predicate "CheckMatchingNumbers" on the reduction for the following rule:

block = 'DO' <number> rest_of_do_head newline 
         block_of_statements
         <number> 'CONTINUE' newline ; CheckMatchingNumbers

to check that the number following the DO keyword, and the number following the CONTINUE keyword match. If the semantic predicate says they match, then a reduction for this rule succeeds and we've aligned the DO head with correct CONTINUE. If the predicate fails, then no reduction is proposed (and this rule is removed from candidates for parsing the local context); some other set of rules has to parse the text.

The actual rules and semantic predicates to handle FORTRAN nesting with shared continues is more complex than this but I think this makes the point.

What you want is full context-sensitive parsing engine. I know people have built them, but I don't know of any full implementations, and don't expect them to be fast.

I did follow Quinn Taylor Jackson's MetaS grammar system for awhile; it sounded like a practical attempt to come close.

like image 191
Ira Baxter Avatar answered Oct 16 '22 08:10

Ira Baxter