I am trying to use antlr to parse a log file. Because I am only interested in partial part of the log, I want to only write a partial parser to process important part. ex: I want to parse the segment: <pre class="prettyprint"><code>[ 123 begin ] </code></pre> So I wrote the grammar: <pre class="prettyprint"><code>log : '[' INT 'begin' ']' ; INT : '0'..'9'+ ; NEWLINE : '\r'? '\n' ; WS : (' '|'\t')+ {skip();} ; </code></pre> But the segment may appear at the middle of a line, ex: <pre class="prettyprint"><code> 111 [ 123 begin ] 222 </code></pre> According to the discussion: What is the wrong with the simple ANTLR grammar? I know why my grammar can't process above statement. I want to know, is there any way to make antlr ignore any error, and continue to process remaining text? Thanks for any advice! Leon

Since <code>'['</code> might also be skipped in certain cases outside of <code>[ 123 begin ]</code>, there's no way to handle this in the lexer. You'll have to create a parser rule that matches token(s) to be skipped (see the <code>noise</code> rule). You'll also need to create a fall-through rule that matches any character if none of the other lexer rules matches (see the <code>ANY</code> rule). A quick demo: <pre class="prettyprint"><code>grammar T; parse : ( log {System.out.println("log=" + $log.text);} | noise )* EOF ; log : OBRACK INT BEGIN CBRACK ; noise : ~OBRACK // any token except '[' | OBRACK ~INT // a '[' followed by any token except an INT | OBRACK INT ~BEGIN // a '[', an INT and any token except an BEGIN | OBRACK INT BEGIN ~CBRACK // a '[', an INT, a BEGIN and any token except ']' ; BEGIN : 'begin'; OBRACK : '['; CBRACK : ']'; INT : '0'..'9'+; NEWLINE : '\r'? '\n'; WS : (' '|'\t')+ {skip();}; ANY : .; </code></pre>

Can I use antlr to parse partial data?

Tags:

antlr

I am trying to use antlr to parse a log file. Because I am only interested in partial part of the log, I want to only write a partial parser to process important part.

ex: I want to parse the segment:

[ 123 begin ]

So I wrote the grammar:

log :   
    '[' INT 'begin' ']'
    ;


INT : '0'..'9'+
    ;


NEWLINE
    : '\r'? '\n'
    ;

WS
    : (' '|'\t')+ {skip();}
    ;

But the segment may appear at the middle of a line, ex:

 111 [ 123 begin ] 222

According to the discussion: What is the wrong with the simple ANTLR grammar? I know why my grammar can't process above statement.

I want to know, is there any way to make antlr ignore any error, and continue to process remaining text?

Thanks for any advice! Leon

371

asked Nov 04 '12 14:11

Leon Chen

1 Answers

Since '[' might also be skipped in certain cases outside of [ 123 begin ], there's no way to handle this in the lexer. You'll have to create a parser rule that matches token(s) to be skipped (see the noise rule).

You'll also need to create a fall-through rule that matches any character if none of the other lexer rules matches (see the ANY rule).

A quick demo:

grammar T;

parse
    : ( log {System.out.println("log=" + $log.text);}
      | noise
      )*
      EOF
    ;

log : OBRACK INT BEGIN CBRACK
    ;

noise
    : ~OBRACK                  // any token except '['
    | OBRACK ~INT              // a '[' followed by any token except an INT
    | OBRACK INT ~BEGIN        // a '[', an INT and any token except an BEGIN
    | OBRACK INT BEGIN ~CBRACK // a '[', an INT, a BEGIN and any token except ']'
    ;

BEGIN   : 'begin';
OBRACK  : '[';
CBRACK  : ']';
INT     : '0'..'9'+;
NEWLINE : '\r'? '\n';
WS      : (' '|'\t')+ {skip();};
ANY     : .;

answered Nov 17 '22 11:11

Bart Kiers

Related questions
                            
                                attribute references not allowed in lexer actions
                            
                                Python: UnicodeEncodeError when reading from stdin
                            
                                Is it feasible to use Antlr for source code completion?
                            
                                'a-zA-Z' came as a complete surprise to me while matching alternative
                            
                                How to preserve whitespace when we use text attribute in Antlr4
                            
                                Are there any simple languages implemented using ANTLR or similar?
                            
                                Catching (and keeping) all comments with ANTLR
                            
                                ANTLR: Space indentation?
                            
                                ANTLR “Cannot launch the debugger. Time-out waiting to connect to the remote parser.”
                            
                                Antlr generated classes access modifier to internal
                            
                                Converting Abstract Syntax Tree to Byte code
                            
                                How do I get an Antlr Parser rule to read from both default AND hidden channel
                            
                                Systematic way to generate ANTLR tree grammar?
                            
                                Proper way to resolve ANTLR lexer rule ambiguities?
                            
                                How do I list all local variables within a Java method / function?
                            
                                Syntactic predicates in ANTLR lexer rules
                            
                                Is there a valid alternative to ANTLR written in C#? [closed]
                            
                                if then else conditional evaluation
                            
                                "FOLLOW_set_in_"... is undefined in generated parser
                            
                                Returning multiple values in ANTLR rule

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With