Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ANTLR to parse a log file

Tags:

antlr

I'm just about starting with ANTLR and trying to parse some pattern out of a log file

for example: log file:

7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=["red","yellow"]){}

7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function2(selected=["Rocket"]){}

7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function3(selected=["blue","yellow"]){}

7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function4(selected=["Speech"]){}

Now I have to parse this file to only find 'Evaluation.Input.Function1' and it's values 'red' and 'yellow' and 'Evaluation.Output.Function2' and values 'Rocket' and ignore everything else and similarly the other 2 input and output functions 3,4 below. There are many such Input and Output functions and I have to find such sets of input/output functions. This is my attempted grammar which is not working. Any help would be appreciated. Being my first attempt at writing grammar and ANTLR it is becoming quite daunting now..

grammar test;

    tag : inputtag+ outputtag+ ;
//Input tag consists of atleast one inputfunction with one or more values
inputtag:  INPUTFUNCTIONS INPUTVALUES+;

//output tag consists of atleast one ontput function with one or more output values
outputtag : OUTPUTFUNCTIONS OUTPUTVALUES+;

INPUTFUNCTIONS 
 : INFUNCTION1 | INFUNCTION2;

OUTPUTFUNCTIONS
 :OUTFUNCTION1 | OUTFUNCTION2;

// Possible input functions in the log file
fragment INFUNCTION1
 :'Evaluation.Input.Function1';

fragment INFUNCTION2
 :'Evaluation.Input.Function3';

//Possible values in the input functions
INPUTVALUES
 : 'red' | 'yellow' | 'blue';

// Possible output functions in the log file 
fragment OUTFUNCTION1
 :'Evaluation.Output.Function2';

fragment OUTFUNCTION2
 :'Evaluation.Output.Function4';

//Possible ouput values in the output functions
fragment OUTPUTVALUES
 : 'Rocket' | 'Speech';
like image 511
RC. Avatar asked Feb 16 '10 23:02

RC.


People also ask

What can ANTLR do?

ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.

Can ANTLR generate AST?

As we've previously mentioned, ANTLR4 no longer builds an AST for you directly from the grammar. We'll have to write code to do that. We have two options: Produce the very same AST that the ANTLR2 parser produced: same classes, same structure.

Is ANTLR LL or LR?

In computer-based language recognition, ANTLR (pronounced antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing.

Should I use ANTLR?

It's better to use an off-the-shelf parser (generator) such as ANTLR when you want to develop and use a custom language. It's better to write your own parser when your objective is to write a parser. UNLESS you have a lot of experience writing parsers and can get a working parser that way more quickly than using ANTLR.


1 Answers

When you're only interested in a part of the file you're parsing, you don't need a parser and write a grammar for the entire format of the file. Only a lexer-grammar and ANTLR's options{filter=true;} will suffice. That way, you will only grab the tokens you defined in your grammar and ignore the rest of the file.

Here's a quick demo:

lexer grammar TestLexer;

options{filter=true;}

@lexer::members {
  public static void main(String[] args) throws Exception {
    String text = 
        "7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function1(selected=[\"red\",\"yellow\"]){}\n"+
        "\n"+
        "7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function2(selected=[\"Rocket\"]){}\n"+
        "\n"+
        "7114422 2009-07-16 15:43:07,078 [LOGTHREAD] INFO StatusLog - Task 0 input : uk.project.Evaluation.Input.Function3(selected=[\"blue\",\"yellow\"]){}\n"+
        "\n"+
        "7114437 2009-07-16 15:43:07,093 [LOGTHREAD] INFO StatusLog - Task 0 output : uk.org.project.Evaluation.Output.Function4(selected=[\"Speech\"]){}";
    ANTLRStringStream in = new ANTLRStringStream(text);
    TestLexer lexer = new TestLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    for(Object obj : tokens.getTokens()) {
        Token token = (Token)obj;
        System.out.println("> token.getText() = "+token.getText());
    }
  }
}

Input
  :  'Evaluation.Input.Function' '0'..'9'+ Params   
  ;

Output
  :  'Evaluation.Output.Function' '0'..'9'+ Params
  ;

fragment
Params
  :  '(selected=[' String ( ',' String )* '])'
  ;

fragment
String
  :  '"' ( ~'"' )* '"'
  ;

Now do:

javac -cp antlr-3.2.jar TestLexer.java
java -cp .:antlr-3.2.jar TestLexer // or on Windows: java -cp .;antlr-3.2.jar TestLexer

and you'll see the following being printed to the console:

> token.getText() = Evaluation.Input.Function1(selected=["red","yellow"])
> token.getText() = Evaluation.Output.Function2(selected=["Rocket"])
> token.getText() = Evaluation.Input.Function3(selected=["blue","yellow"])
> token.getText() = Evaluation.Output.Function4(selected=["Speech"])
like image 78
Bart Kiers Avatar answered Oct 05 '22 03:10

Bart Kiers