Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR4 Lexer getTokens() returning 0 tokens

Tags:

java

antlr

antlr4

I'm running code from here: https://github.com/bkiers/antlr4-csv-demo. I want to view the tokens analyzed by the lexer by adding this line:

System.out.println("Number of tokens: " + tokens.getTokens().size())

to Main.java:

public static void main(String[] args) throws Exception {  
    // the input source  
    String source =   
        "aaa,bbb,ccc" + "\n" +   
        "\"d,\"\"d\",eee,fff";  

    // create an instance of the lexer  
    CSVLexer lexer = new CSVLexer(new ANTLRInputStream(source));  

    // wrap a token-stream around the lexer  
    CommonTokenStream tokens = new CommonTokenStream(lexer);  

    // look at tokens analyzed
    System.out.println("Number of tokens: " + tokens.getTokens().size())

    // create the parser  
    CSVParser parser = new CSVParser(tokens);  

    // invoke the entry point of our grammar  
    List<List<String>> data = parser.file().data;  

    // display the contents of the CSV source  
    for(int r = 0; r < data.size(); r++) {  
      List<String> row = data.get(r);  
      for(int c = 0; c < row.size(); c++) {  
        System.out.println("(row=" + (r+1) + ",col=" + (c+1) + ") = " + row.get(c));  
      }  
    }  
  }  

The result printed out is: Number of tokens: 0. Why is the list returned by getTokens() empty? The rest of the parser code returns the data completely fine.

EDIT: So using lexer.getAllTokens() instead works, but why is the CommonTokenStream not returning the correct tokens?

csv.g4:

grammar CSV;

@header {
  package csv;
}

file returns [List<List<String>> data]  
@init {$data = new ArrayList<List<String>>();}  
 : (row {$data.add($row.list);})+ EOF  
 ; 

row returns [List<String> list]  
@init {$list = new ArrayList<String>();}  
 : a=value {$list.add($a.val);} (Comma b=value {$list.add($b.val);})* (LineBreak | EOF)  
 ;

value returns [String val]  
 : SimpleValue {$val = $SimpleValue.text;}  
 | QuotedValue   
   { 
     $val = $QuotedValue.text; 
     $val = $val.substring(1, $val.length()-1); // remove leading- and trailing quotes 
     $val = $val.replace("\"\"", "\""); // replace all `""` with `"` 
   }  
 ;  

Comma  
 : ','  
 ;  

LineBreak  
 : '\r'? '\n'  
 | '\r'  
 ;  

SimpleValue  
 : ~[,\r\n"]+  
 ;  

QuotedValue  
 : '"' ('""' | ~'"')* '"'  
 ;  
like image 799
Corey Wu Avatar asked Jun 05 '15 16:06

Corey Wu


1 Answers

Normally, the Parser is responsible for initiating the lexing of the input stream. To initiate lexing manually, call CommonTokenStream.fill() (which is implemented in BufferedTokenStream).

like image 165
GRosenberg Avatar answered Oct 21 '22 23:10

GRosenberg