i am reading the Definitive ANTLR reference by Terence Parr, where he says:
Semantic predicates are a powerful means of recognizing context-sensitive language structures by allowing runtime information to drive recognition
But the examples in the book are very simple. What i need to know is: can ANTLR parse context-sensitive rules like:
xAy --> xBy
If ANTLR can't parse these rules, is there is another tool that deals with context-sensitive grammars?
In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar (and equivalently by a noncontracting grammar). Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.
Context-sensitive Language: The language that can be defined by context-sensitive grammar is called CSL. Properties of CSL are : Union, intersection and concatenation of two context-sensitive languages is context-sensitive.
Context-sensitive grammars are one such class. These grammars generate languages that can be recognized with a restricted class of Turing machines called linear-bounded automata. A grammar G = (V, T, S, P) is context-sensitive if all productions are of the form , where and .
ANTLR parses only grammars which are LL(*). It can't parse using grammars for full context-sensitive languages such as the example you provided. I think what Parr meant was that ANTLR can parse some languages that require some (left) context constraints.
In particular, one can use semantic predicates on "reduction actions" (we do this for GLR parsers used by our DMS Software Reengineering Toolkit but the idea is similar for ANTLR, I think) to inspect any data collected by the parser so far, either as ad hoc side effects of other semantic actions, or in a partially-built parse tree.
For our DMS-based DMS-based Fortran front end, there's a context-sensitive check to ensure that DO-loops are properly lined up. Consider:
DO 20, I= ...
DO 10, J = ...
...
20 CONTINUE
10 CONTINUE
From the point of view of the parser, the lexical stream looks like this:
DO <number> , <variable> = ...
DO <number> , <variable> = ...
...
<number> CONTINUE
<number> CONTINUE
How can the parser then know which DO statement goes with which CONTINUE statement? (saying that each DO matches its closest CONTINUE won't work, because FORTRAN can share a CONTINUE statement with multiple DO-heads).
We use a semantic predicate "CheckMatchingNumbers" on the reduction for the following rule:
block = 'DO' <number> rest_of_do_head newline
block_of_statements
<number> 'CONTINUE' newline ; CheckMatchingNumbers
to check that the number following the DO keyword, and the number following the CONTINUE keyword match. If the semantic predicate says they match, then a reduction for this rule succeeds and we've aligned the DO head with correct CONTINUE. If the predicate fails, then no reduction is proposed (and this rule is removed from candidates for parsing the local context); some other set of rules has to parse the text.
The actual rules and semantic predicates to handle FORTRAN nesting with shared continues is more complex than this but I think this makes the point.
What you want is full context-sensitive parsing engine. I know people have built them, but I don't know of any full implementations, and don't expect them to be fast.
I did follow Quinn Taylor Jackson's MetaS grammar system for awhile; it sounded like a practical attempt to come close.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With