Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly parse a VB Case statement?

Tags:

parsing

antlr4

I'm trying to parse VBA code, and the 5.4.2.10 section of the spec defines the Select Case statement, which we've defined as follows:

// 5.4.2.10 Select Case Statement
selectCaseStmt :
    SELECT whiteSpace? CASE whiteSpace? selectExpression endOfStatement
    caseClause*
    caseElseClause?
    END_SELECT
;
selectExpression : expression;
caseClause :
    CASE whiteSpace rangeClause (whiteSpace? COMMA whiteSpace? rangeClause)* endOfStatement block
;
caseElseClause : CASE whiteSpace? ELSE endOfStatement block;
rangeClause :
    expression
    | selectStartValue whiteSpace TO whiteSpace selectEndValue   
    | (IS whiteSpace?)? comparisonOperator whiteSpace? expression
;
selectStartValue : expression;
selectEndValue : expression;

The problem is that the expression in rangeClause is taking precedence, and makes this:

Select Case foo
    Case Is = 42
        Exit Sub
End Select

...ultimately get picked up and treated as {undeclared-variable} {EQ} {literal}, which is a problem, because Is ought to be a lexer token, not the LHS of a comparison expression:

expression whiteSpace? (EQ | NEQ | LT | GT | LEQ | GEQ | LIKE | IS) whiteSpace? expression    # relationalOp

I tried reordering the alternatives so that the expression branch has lower precedence, like this:

rangeClause :
    selectStartValue whiteSpace TO whiteSpace selectEndValue   
    | (IS whiteSpace?)? comparisonOperator whiteSpace? expression
    | expression
;

But that broke the entire grammar in all kinds of ways (breaks ~1000 tests in my project), so instead I tried changing the rangeClause to this (removed optional tokens, because Is without = is actually illegal VBA code):

rangeClause :
      expression (whiteSpace TO whiteSpace expression)?                 #caseFromTo
    | (IS whiteSpace comparisonOperator whiteSpace)? expression         #caseIs
;

And then working with CaseFromToContext and CaseIsContext classes in the code (had to, to keep it compiling), but again it broke ~1000 tests in my project.

Then I figured, "hey that's potentially ambiguous!" and turned it into this:

rangeClause :
      expression whiteSpace TO whiteSpace expression                    #caseFromTo
    | IS whiteSpace comparisonOperator whiteSpace expression            #caseIs
    | expression                                                        #caseExpr
;

...but no luck, same identical outcome.

How can I make the rangeClause understand this annoying Case Is = foobar syntax? I'm using ANTLR 4.3, but we're planning to upgrade to ANTLR 4.6 soon-ish.

If additional context is needed, the complete VBAParser.g4 grammar is on github.

like image 644
Mathieu Guindon Avatar asked Nov 08 '22 03:11

Mathieu Guindon


1 Answers

Turns out that re-ordering actually does work, but in order to keep the ambiguity out of the parse, the IS whiteSpace comparisonOperator has to come first:

rangeClause :
    (IS whiteSpace?)? comparisonOperator whiteSpace? expression
    | selectStartValue whiteSpace TO whiteSpace selectEndValue 
    | expression

The problem is with expression (and by extension selectStartValue and selectEndValue) which will recursively match Is = because comparisonOperator comparisonOperator is an expression match. There's probably some work that can be done to prevent comparisonOperator comparisonOperator from matching expression (it's never valid in VBA AFAIK), but the above works as a quick and dirty fix.

Basically all the above grammar does is ensure that the "invalid" comparisonOperator comparisonOperator matches as a rangeClause before it can be matched as an expression.

like image 50
Comintern Avatar answered Nov 15 '22 06:11

Comintern