I built an Antlr 4 filter using a grammar (not important in the context here) and the filters look something like "age > 30 AND height < 6.1".
However, the question is, I'll build this filter once, and use it to evaluate maybe a thousand docs. Each doc will have an "age" and "height" attribute in them.
However what I'm not sure is, how to re-use parser or lexer so that I can speed up the evaluation. Build a lexer && parser each time seems really a waste of time.
The java code is something like
public Boolean createFilterVisitor(String input, DocFieldAccessor docFieldAccessor) {
FilterLexer lexer = new FilterLexer(CharStreams.fromString(input));
lexer.removeErrorListener(ConsoleErrorListener.INSTANCE);
CommonTokenStream tokens = new CommonTokenStream(lexer);
FilterParser parser = new FilterParser(tokens);
parser.addErrorListener(new FilterErrorListener());
parser.removeErrorListener(ConsoleErrorListener.INSTANCE);
FilterVisitorImpl filterVisitor = new FilterVisitorImpl(docFieldAccessor);
return filterVisitor.visit(parser.filter());
}
and then
for doc in docs:
createFilterVisitor(doc, someAccessor);
I tried to build lexer and parser once, and then do lexer.reset() and parser.reset() at the beginning of the loop. It seems to work (it filters reasonable docs) but I'm not really sure if I'm doing it correctly. I don't know what reset means and when should I use it
So my question is:
I have this code. Does this work?
public class KalaFilter {
private final String filterClause;
private FilterLexer lexer;
private FilterParser parser;
@Getter
private final FilterAnalyzer filterAnalyzer;
public KalaFilter(String filterClause) {
this.filterClause = filterClause;
lexer = new FilterLexer(CharStreams.fromString(filterClause));
lexer.removeErrorListener(ConsoleErrorListener.INSTANCE);
CommonTokenStream tokens = new CommonTokenStream(lexer);
parser = new FilterParser(tokens);
parser.addErrorListener(new FilterErrorListener());
parser.removeErrorListener(ConsoleErrorListener.INSTANCE);
ParseTree parseTree = parser.filter();
filterAnalyzer = new FilterAnalyzer();
ParseTreeWalker walker = new ParseTreeWalker(); // create standard walker
walker.walk(filterAnalyzer, parseTree);
}
// return filter result by visit parser
public Boolean visitFitlerResult(DocFieldAccessor docFieldAccessor) {
//lexer.reset();
//FilterLexer lexer = new FilterLexer(CharStreams.fromString(filterClause));
/*
CommonTokenStream tokens = new CommonTokenStream(lexer);
FilterParser parser = new FilterParser(tokens);
parser.addErrorListener(new FilterErrorListener());
parser.removeErrorListener(ConsoleErrorListener.INSTANCE);
*/
parser.reset();
FilterVisitorImpl filterVisitor = new FilterVisitorImpl(docFieldAccessor);
return filterVisitor.visit(parser.filter());
}
}
The way your code is laid out, you're passing a string to the constructor, parsing that string in the constructor and then parsing that exact same string again every time that visitFilterResult
called. Unless you're doing something very unusual in your grammar's actions, parsing the same string should produce the exact same result each time, so there's no reason to repeatedly parse the same string instead of just re-using the result.
So instead of storing the string, lexer and parser as instance variables, you should be storing the parse tree you get back from calling parser.filter();
. Then instead of invoking the parser in visitFilterResult
, you can just use the existing parse tree and invoke the visitor on that. That way there's no need to reset anything because the parser is only ever invoked once (in the constructor).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With