Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unindented code breaks my grammar

I have a .g4 grammar for vba/vb6 a lexer/parser, where the lexer is skipping line continuation tokens - not skipping them breaks the parser and isn't an option. Here's the lexer rule in question:

LINE_CONTINUATION : ' ' '_' '\r'? '\n' -> skip;

The problem this is causing, is that whenever a continued line starts at column 1, the parser blows up:

Sub Test()
Debug.Print "Some text " & _
vbNewLine & "Some more text"    
End Sub

I thought "Hey I know! I'll just pre-process the string I'm feeding ANTLR to insert an extra whitespace before the underscore, and change the grammar to accept it!"

So I changed the rule like this:

LINE_CONTINUATION : WS? WS '_' NEWLINE -> skip;
NEWLINE : WS? ('\r'? '\n') WS?; 
WS : [ \t]+;

...and the test vba code above gave me this parser error:

extraneous input 'vbNewLine' expecting WS

For now my only solution is to tell my users to properly indent their code. Is there any way I can fix that grammar rule?

(Full VBA.g4 grammar file on GitHub)

like image 413
Mathieu Guindon Avatar asked Jan 05 '16 21:01

Mathieu Guindon


1 Answers

You basically want line continuation to be treated like whitespace.

OK, then add the lexical definition of line continuation to the WS token. Then WS will pick up the line continuation, and you don't need the LINECONTINUATION anywhere.

//LINE_CONTINUATION : ' ' '_' '\r'? '\n' -> skip;
NEWLINE : WS? ('\r'? '\n') WS?; 
WS : ([ \t]+)|(' ' '_' '\r'? '\n');
like image 96
Ira Baxter Avatar answered Sep 24 '22 19:09

Ira Baxter