Parsing Context Sensitive Language

1 Answers

ANTLR parses only grammars which are LL(*). It can't parse using grammars for full context-sensitive languages such as the example you provided. I think what Parr meant was that ANTLR can parse some languages that require some (left) context constraints.

In particular, one can use semantic predicates on "reduction actions" (we do this for GLR parsers used by our DMS Software Reengineering Toolkit but the idea is similar for ANTLR, I think) to inspect any data collected by the parser so far, either as ad hoc side effects of other semantic actions, or in a partially-built parse tree.

For our DMS-based DMS-based Fortran front end, there's a context-sensitive check to ensure that DO-loops are properly lined up. Consider:

 DO  20, I= ...
   DO 10, J = ...
       ...
20  CONTINUE
10  CONTINUE

From the point of view of the parser, the lexical stream looks like this:

DO  <number> , <variable> =  ...
    DO <number> , <variable> = ...
         ...
<number> CONTINUE
<number> CONTINUE

How can the parser then know which DO statement goes with which CONTINUE statement? (saying that each DO matches its closest CONTINUE won't work, because FORTRAN can share a CONTINUE statement with multiple DO-heads).

We use a semantic predicate "CheckMatchingNumbers" on the reduction for the following rule:

block = 'DO' <number> rest_of_do_head newline 
         block_of_statements
         <number> 'CONTINUE' newline ; CheckMatchingNumbers

to check that the number following the DO keyword, and the number following the CONTINUE keyword match. If the semantic predicate says they match, then a reduction for this rule succeeds and we've aligned the DO head with correct CONTINUE. If the predicate fails, then no reduction is proposed (and this rule is removed from candidates for parsing the local context); some other set of rules has to parse the text.

The actual rules and semantic predicates to handle FORTRAN nesting with shared continues is more complex than this but I think this makes the point.

What you want is full context-sensitive parsing engine. I know people have built them, but I don't know of any full implementations, and don't expect them to be fast.

I did follow Quinn Taylor Jackson's MetaS grammar system for awhile; it sounded like a practical attempt to come close.

191

answered Oct 16 '22 08:10

Ira Baxter

Related questions
                            
                                Parsing XML using unix terminal
                            
                                How can I extract a string between matching braces in Perl?
                            
                                What's the best way to parse Excel file in Perl?
                            
                                Parse Remote CSV File using Nodejs / Papa Parse?
                            
                                Scala: XML Attribute parsing
                            
                                SQlBulkCopy The given value of type DateTime from the data source cannot be converted to type int of the specified target column
                            
                                How do I traverse and search a python dictionary?
                            
                                Trying to parse a flag enum to string
                            
                                PDF parsing in C++ (PoDoFo)
                            
                                XML parsing issue with '&' in element text
                            
                                Remove all child nodes of a node
                            
                                Evaluate string with math operators [duplicate]
                            
                                How to construct parsing table for LL(k>1)?
                            
                                Parse a tables with unicode chars in variables from JSON with SAS BASE
                            
                                How to programmatically guess whether a CSV file is comma or semicolon delimited
                            
                                Replace comment in JavaScript AST with subtree derived from the comment's content
                            
                                Are ">>"s in type parameters tokenized using a special rule?
                            
                                How to require a timestamp to be zero-padded during validation in Python?
                            
                                Which grammars can be parsed using recursive descent without backtracking?
                            
                                JSON.parse() on a large array of objects is using way more memory than it should

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing Context Sensitive Language

Tags:

parsing

compiler-construction

context-sensitive-grammar

antlr

Radi

People also ask

1 Answers

Ira Baxter

Recent Activity

Donate For Us