I'm parsing a language that has a statement 'code' followed by '{', followed by a bunch of code that I have no interest in parsing, followed by '}'. I'd ideally like to have a rule like:
skip_code: 'code' '{' ~['}']* '}'
..which would simply skip ahead to the closing curly brace. The problem is that the code being skipped could itself have pairs of curly braces. So, what I essentially need to do is run a counter and increment on each '{' and decrement on each '}', and end the parse rule when the counter is back to 0.
What's the best way of doing this in ANTLR4? Should I skip off to a custom function when 'code' is detected and swallow up the tokens and run my counter, or is there some elegant way to express this in the grammar itself?
EDIT: Some sample code, as requested:
class foo;
int m_bar;
function foo_bar;
print("hello world");
endfunction
code {
// This is some C code
void my_c_func() {
printf("I have curly braces {} in a string!");
}
}
function back_to_parsed_code;
endfunction
endclass
I'd use something like:
skip_code: CODE_SYM block;
block: OPEN_CURLY (~CLOSE_CURLY | block)* CLOSE_CURLY;
CODE_SYM: 'code';
OPEN_CURLY: '{';
CLOSE_CURLY: '}';
I'd handle these code blocks in the lexer. A quick demo:
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.Token;
public class Main {
public static void main(String[] args) {
String source = "class foo;\n" +
" int m_bar;\n" +
" function foo_bar;\n" +
" print(\"hello world\");\n" +
" endfunction\n" +
" code {\n" +
" // This is some C code }}} \n" +
" void my_c_func() {\n" +
" printf(\"I have curly braces {} in a string!\");\n" +
" }\n" +
" }\n" +
" function back_to_parsed_code;\n" +
" endfunction\n" +
"endclass";
System.out.printf("Tokenizing:\n\n%s\n\n", source);
DemoLexer lexer = new DemoLexer(new ANTLRInputStream(source));
for (Token t : lexer.getAllTokens()){
System.out.printf("%-20s '%s'\n",
DemoLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replaceAll("[\r\n]", "\\\\n")
);
}
}
}
If you run the class above, the following will be printed:
Tokenizing:
class foo;
int m_bar;
function foo_bar;
print("hello world");
endfunction
code {
// This is some C code }}}
void my_c_func() {
printf("I have curly braces {} in a string!");
}
}
function back_to_parsed_code;
endfunction
endclass
ID 'class'
ID 'foo'
ANY ';'
ID 'int'
ID 'm_bar'
ANY ';'
ID 'function'
ID 'foo_bar'
ANY ';'
ID 'print'
ANY '('
STRING '"hello world"'
ANY ')'
ANY ';'
ID 'endfunction'
ID 'code'
BLOCK '{\n // This is some C code }}} \n void my_c_func() {\n printf("I have curly braces {} in a string!");\n }\n }'
ID 'function'
ID 'back_to_parsed_code'
ANY ';'
ID 'endfunction'
ID 'endclass'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With