I'm running code from here: https://github.com/bkiers/antlr4-csv-demo. I want to view the tokens analyzed by the lexer by adding this line:
System.out.println("Number of tokens: " + tokens.getTokens().size())
to Main.java:
public static void main(String[] args) throws Exception {
// the input source
String source =
"aaa,bbb,ccc" + "\n" +
"\"d,\"\"d\",eee,fff";
// create an instance of the lexer
CSVLexer lexer = new CSVLexer(new ANTLRInputStream(source));
// wrap a token-stream around the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// look at tokens analyzed
System.out.println("Number of tokens: " + tokens.getTokens().size())
// create the parser
CSVParser parser = new CSVParser(tokens);
// invoke the entry point of our grammar
List<List<String>> data = parser.file().data;
// display the contents of the CSV source
for(int r = 0; r < data.size(); r++) {
List<String> row = data.get(r);
for(int c = 0; c < row.size(); c++) {
System.out.println("(row=" + (r+1) + ",col=" + (c+1) + ") = " + row.get(c));
}
}
}
The result printed out is: Number of tokens: 0
. Why is the list returned by getTokens()
empty? The rest of the parser code returns the data completely fine.
EDIT: So using lexer.getAllTokens()
instead works, but why is the CommonTokenStream
not returning the correct tokens?
csv.g4:
grammar CSV;
@header {
package csv;
}
file returns [List<List<String>> data]
@init {$data = new ArrayList<List<String>>();}
: (row {$data.add($row.list);})+ EOF
;
row returns [List<String> list]
@init {$list = new ArrayList<String>();}
: a=value {$list.add($a.val);} (Comma b=value {$list.add($b.val);})* (LineBreak | EOF)
;
value returns [String val]
: SimpleValue {$val = $SimpleValue.text;}
| QuotedValue
{
$val = $QuotedValue.text;
$val = $val.substring(1, $val.length()-1); // remove leading- and trailing quotes
$val = $val.replace("\"\"", "\""); // replace all `""` with `"`
}
;
Comma
: ','
;
LineBreak
: '\r'? '\n'
| '\r'
;
SimpleValue
: ~[,\r\n"]+
;
QuotedValue
: '"' ('""' | ~'"')* '"'
;
Normally, the Parser is responsible for initiating the lexing of the input stream. To initiate lexing manually, call CommonTokenStream.fill() (which is implemented in BufferedTokenStream).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With