Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CFG / PEG used for Code completion?

I'm wondering if it's possible to use a CFG or PEG grammar as a basis for code completion directly without modification. I have heard that code completion is in IDE's is sometimes manipulated and massaged or even hard coded so that it performs better.

I want to code complete on a small DSL so I fully understand that a grammar cannot help a code completion system with knowledge of library functions etc.

As far as I'm aware the parser itself needs to at least provide a system for querying what it expects next.

In particular I'm interested in a javascript code completion solution using peg.js or jison

like image 638
BefittingTheorem Avatar asked Aug 16 '11 12:08

BefittingTheorem


2 Answers

It is fairly easy to build Javascript editor with code completion from PEG grammar. I would describe how to do it with PEG.js. You need to extend your grammar with some lax parsing rules that allow to provide suggestions when previous statements are broken. These lax rules need to be handled conditionally or you will need two separate grammars - one for parsing source and second for code completion. You can maintain one grammar by using Javascript predicates (available in PEG.js). It looks like &{return laxParsing} and it causes that whole rule to be processed when laxParsing flag is true. You can switch between lax and strict parsing easily by setting parser's internal flag.

To provide suggestions to user easily you must modify slightly generated PEG.js parser (version 0.5) to receive in the parsing error structure position (beside the column and line) and list of expectations (beside the error message). You can copy prepared fragment from https://gist.github.com/1281239.

When you have parser then you can attach it in editor on for example CTRL+SPACE keypress. When these are pressed in text source you need to put a special unparseable sign in place of cursor (to cause a parsing error) and launch parser in lax mode. Then you receive an error with list of suggestions.

Some of suggestions are not only syntax but also they define references (e.g. entities, variables). You can trigger searching these when a particular expectation is found (e.g. VariableName). You can provide the list by parsing the same source in a different lax parsing mode (filtering only variable names).

For a working example and source to this approach you can check on https://github.com/mstefaniuk/Concrete-Freetext.

like image 82
Marcin Stefaniuk Avatar answered Nov 16 '22 09:11

Marcin Stefaniuk


PEG.js gives you quite a bit of context when it generates a SyntaxError. For example, if you have a grammar for SQL and feed it something like:

FOO > 10 A

Then PEG.js will return this:

{
  "message": "Expected \"AND\", \"ORDER BY\" or end of input but \"A\" found.",
  "expected": [
    {
      "type": "literal",
      "value": "AND",
      "description": "\"AND\""
    },
    {
      "type": "literal",
      "value": "ORDER BY",
      "description": "\"ORDER BY\""
    },
    {
      "type": "end",
      "description": "end of input"
    }
  ],
  "found": "A",
  "offset": 9,
  "line": 1,
  "column": 10,
  "name": "SyntaxError"
}

What it's saying is that it parsed characters 0–9 of the string ("FOO > 10 ") but then encountered an unexpected token at character 10. And it gives you a list of the next tokens it was expecting: FOO > 10 AND, FOO > 10 ORDER BY, FOO > 10. If you tack these onto the valid portion of the query, you'll get a good set of possible completions:

function getCompletions(pegParse, text) {
  var parsedText = pegParse(text);
  var completions = [];
  if (parsedText.expected) {
    var start = text.substr(0, parsedText.offset);
    parsedText.expected.forEach(function(expected) {
      if (expected.type != 'literal') return;
      var completion = start + expected.value;
      if (completion.substr(0, text.length) == text) {
        completions.push(completion);
      }
    });
  }
  return completions;
}

This is quite simplistic -- a real autocomplete would match more than just literals and would need some way to take advantage of context not available to the grammar, e.g. the list of arguments to a function that the user is calling.

like image 5
danvk Avatar answered Nov 16 '22 10:11

danvk