I am trying to use antlr to parse a log file. Because I am only interested in partial part of the log, I want to only write a partial parser to process important part.
ex: I want to parse the segment:
[ 123 begin ]
So I wrote the grammar:
log :
'[' INT 'begin' ']'
;
INT : '0'..'9'+
;
NEWLINE
: '\r'? '\n'
;
WS
: (' '|'\t')+ {skip();}
;
But the segment may appear at the middle of a line, ex:
111 [ 123 begin ] 222
According to the discussion: What is the wrong with the simple ANTLR grammar? I know why my grammar can't process above statement.
I want to know, is there any way to make antlr ignore any error, and continue to process remaining text?
Thanks for any advice! Leon
ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.
ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate parse trees or abstract syntax trees, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers.
A lexer (often called a scanner) breaks up an input stream of characters into vocabulary symbols for a parser, which applies a grammatical structure to that symbol stream.
An ANTLR lexer creates a Token object after matching a lexical rule. Each request for a token starts in Lexer. nextToken , which calls emit once it has identified a token. emit collects information from the current state of the lexer to build the token.
Since '['
might also be skipped in certain cases outside of [ 123 begin ]
, there's no way to handle this in the lexer. You'll have to create a parser rule that matches token(s) to be skipped (see the noise
rule).
You'll also need to create a fall-through rule that matches any character if none of the other lexer rules matches (see the ANY
rule).
A quick demo:
grammar T;
parse
: ( log {System.out.println("log=" + $log.text);}
| noise
)*
EOF
;
log : OBRACK INT BEGIN CBRACK
;
noise
: ~OBRACK // any token except '['
| OBRACK ~INT // a '[' followed by any token except an INT
| OBRACK INT ~BEGIN // a '[', an INT and any token except an BEGIN
| OBRACK INT BEGIN ~CBRACK // a '[', an INT, a BEGIN and any token except ']'
;
BEGIN : 'begin';
OBRACK : '[';
CBRACK : ']';
INT : '0'..'9'+;
NEWLINE : '\r'? '\n';
WS : (' '|'\t')+ {skip();};
ANY : .;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With