ANTLR4 parse rule to match open/close brackets

Question

I'm parsing a language that has a statement 'code' followed by '{', followed by a bunch of code that I have no interest in parsing, followed by '}'. I'd ideally like to have a rule like:

skip_code: 'code' '{' ~['}']* '}'

..which would simply skip ahead to the closing curly brace. The problem is that the code being skipped could itself have pairs of curly braces. So, what I essentially need to do is run a counter and increment on each '{' and decrement on each '}', and end the parse rule when the counter is back to 0.

What's the best way of doing this in ANTLR4? Should I skip off to a custom function when 'code' is detected and swallow up the tokens and run my counter, or is there some elegant way to express this in the grammar itself?

EDIT: Some sample code, as requested:

class foo;
  int m_bar;
  function foo_bar;
     print("hello world");
  endfunction
  code {
     // This is some C code
     void my_c_func() {
        printf("I have curly braces {} in a string!");
     }
  }
  function back_to_parsed_code;
  endfunction
endclass

Mike Lischke · Accepted Answer

I'd use something like:

skip_code: CODE_SYM block;
block: OPEN_CURLY (~CLOSE_CURLY | block)* CLOSE_CURLY;

CODE_SYM: 'code';
OPEN_CURLY: '{';
CLOSE_CURLY: '}';

Bart Kiers · Answer

I'd handle these code blocks in the lexer. A quick demo:

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.Token;

public class Main {

    public static void main(String[] args) {

        String source = "class foo;
" +
                "  int m_bar;
" +
                "  function foo_bar;
" +
                "     print(\"hello world\");
" +
                "  endfunction
" +
                "  code {
" +
                "     // This is some C code }}} 
" +
                "     void my_c_func() {
" +
                "        printf(\"I have curly braces {} in a string!\");
" +
                "     }
" +
                "  }
" +
                "  function back_to_parsed_code;
" +
                "  endfunction
" +
                "endclass";

        System.out.printf("Tokenizing:

%s

", source);

        DemoLexer lexer = new DemoLexer(new ANTLRInputStream(source));

        for (Token t : lexer.getAllTokens()){
            System.out.printf("%-20s '%s'
",
                    DemoLexer.VOCABULARY.getSymbolicName(t.getType()),
                    t.getText().replaceAll("[
]", "\\n")
            );
        }
    }
}

If you run the class above, the following will be printed:

Tokenizing:

class foo;
  int m_bar;
  function foo_bar;
     print("hello world");
  endfunction
  code {
     // This is some C code }}} 
     void my_c_func() {
        printf("I have curly braces {} in a string!");
     }
  }
  function back_to_parsed_code;
  endfunction
endclass

ID                   'class'
ID                   'foo'
ANY                  ';'
ID                   'int'
ID                   'm_bar'
ANY                  ';'
ID                   'function'
ID                   'foo_bar'
ANY                  ';'
ID                   'print'
ANY                  '('
STRING               '"hello world"'
ANY                  ')'
ANY                  ';'
ID                   'endfunction'
ID                   'code'
BLOCK                '{
     // This is some C code }}} 
     void my_c_func() {
        printf("I have curly braces {} in a string!");
     }
  }'
ID                   'function'
ID                   'back_to_parsed_code'
ANY                  ';'
ID                   'endfunction'
ID                   'endclass'

ANTLR4 parse rule to match open/close brackets

Tags:

antlr4

Stan

2 Answers

Mike Lischke

Bart Kiers

Recent Activity

Donate For Us

ANTLR4 parse rule to match open/close brackets

Tags:

antlr4

Stan

2 Answers

Mike Lischke

Bart Kiers

Related questions

Recent Activity

Donate For Us