Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Autocomplete intergration with parser

How is auto-complete normally handled by a parser?

If we take an example, where I pass the following to a parser:

"int i=2"

then the auto-complete options for this might include:

"int i=2,"
"int i=2;"

Should auto-completion be a part of the parser?

If not, then in the case of an event based parser, I'm guessing that the parser will emit an event containing ids for those branches in the parser's state machine which are possible. The auto-complete module would then know what to print for each such state.

For a tree based parser, the parser will have to return a tree structure which contains in some way those branches that are available.

Is this how its done? Which type of parser is best for processing command strings when auto-completion is a requirement?

like image 951
Baz Avatar asked Sep 20 '12 12:09

Baz


1 Answers

You can read the lookahead set (i.e. the token types which are acceptable follows) from an LR(k) grammar, but such grammars tend to be huge. The various forms of compressing the grammar (of which LALR(1) is probably still the most common) make the lookahead set less precise (it will always include valid token types, but it might also contain invalid ones). Invalid token types in the lookahead set can also be introduced by table compression and by deliberate inclusion of error productions (included in order to improve error messages).

Reading the lookahead set from a recursive descent parser can be trickier, in part because such parsers are typically open-coded rather than depending on transition tables. However, in theory at least, an LL(k) grammar also has the possibility of computing a lookahead set, although it also might turn out to be imprecise.

The most interesting autocomplete, though, is not punctuation but symbols. A grammar is not sufficient to tell you which names are in scope at a given point, although it may be able to tell you which types of names would be feasible. You'd need to hook into the symbol table in order to get accurate autocomplete information. In languages where identifiers can be used prior to their declaration, this might be even trickier, although it's likely that the parser does keep a list of unresolved names somewhere.

Another difficulty with using parsers to generate autocompletion information is that parsers tend to be optimized for syntactically correct programs, and may not work at all after detecting a syntax error. For an IDE user, this can be truly irritating; minor punctuation errors disable autocomplete until the error is tracked down and fixed. (Personally, I found such systems too distracting to code with; I'd rather focus on what I'm writing at the moment than on missing parentheses in other parts of the code.)

IM(H)O, it would be better to use something vaguely resembling packrat parsing to pick out enough of the context at the insertion point to get a reasonable notion of what might follow. If you have access to complete correct datatype declarations, use them, but there is always the fallback of putting anything in the text that looks like a symbol into the lookahead set (although that, too, can get irritating).

Good luck, anyway.

like image 168
rici Avatar answered Oct 12 '22 22:10

rici