Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace comment in JavaScript AST with subtree derived from the comment's content

I'm the author of doctest, quick and dirty doctests for JavaScript and CoffeeScript. I'd like to make the library less dirty by using a JavaScript parser rather than regular expressions to locate comments.

I'd like to use Esprima or Acorn to do the following:

  1. Create an AST
  2. Walk the tree, and for each comment node:
    1. Create an AST from the comment node's text
    2. Replace the comment node in the main tree with this subtree

Input:

!function() {

  // > toUsername("Jesper Nøhr")
  // "jespernhr"
  var toUsername = function(text) {
    return ('' + text).replace(/\W/g, '').toLowerCase()
  }

}()

Output:

!function() {

  doctest.input(function() {
    return toUsername("Jesper Nøhr")
  });
  doctest.output(4, function() {
    return "jespernhr"
  });
  var toUsername = function(text) {
    return ('' + text).replace(/\W/g, '').toLowerCase()
  }

}()

I don't know how to do this. Acorn provides a walker which takes a node type and a function, and walks the tree invoking the function each time a node of the specified type is encountered. This seems promising, but doesn't apply to comments.

With Esprima I can use esprima.parse(input, {comment: true, loc: true}).comments to get the comments, but I'm not sure how to update the tree.

like image 805
davidchambers Avatar asked Feb 06 '13 06:02

davidchambers


People also ask

What is AST in JavaScript?

An AST is the result of parsing code. For JavaScript, an AST is a JavaScript object containing a tree representation of your source. Before we use it, we have to create it. Depending on the code we are parsing, we choose the appropriate parser. Here since the code is ES5-compatible, we can choose the acorn parser.

How do you use AST?

How to do using ast library, a = b + 3 or a = 3+b , both have same node type i.e. BinOp, you can validate variable “a” value and its node type. For each line of code, create AST node then compare value, node type and other parameters as well like operator, operand, function name, class name, index, etc… if required.

What is AST in compiler?

An AST is usually the result of the syntax analysis phase of a compiler. It often serves as an intermediate representation of the program through several stages that the compiler requires, and has a strong impact on the final output of the compiler.

How do you store AST?

There are no standards for storing ASTs, or more importantly from your point of view, sharing them among tools. The reason is that ASTs are dependent on grammars (which vary; C has "many" depending on which specific compiler and version) and parsing technology.


2 Answers

Most AST-producing parsers throw away comments. I don't know what Esprima or Acorn do, but that might be the issue.

.... in fact, Esprima lists comment capture as a current bug: http://code.google.com/p/esprima/issues/detail?id=197

... Acorn's code is right there in GitHub. It appears to throw comments away, too.

So, looks like you get to fix either parser to capture the comments first, at which point your task should be straightforward, or, you're stuck.

Our DMS Software Reengineering Toolkit has JavaScript parsers that capture comments, in the tree. It also has language substring parsers, that could be used to parse the comment text into JavaScript ASTs of whatever type the comment represents (e.g, function declaration, expression, variable declaration, ...), and the support machinery to graft such new ASTs into the main tree. If you are going to manipulate ASTs, this substring capability is likely important: most parsers won't parse arbitrary language fragments, they are wired only to parse "whole programs". For DMS, there are no comment nodes to replace; there are comments associated with ASTs nodes, so the grafting process is a little trickier than just "replace comment nodes". Still pretty easy.

I'll observe that most parsers (including these) read the source and break it into tokens by using or applying the equivalent of a regular expressions. So, if you are already using these to locate comments (that means using them to locate *non*comments to throw away, as well, e.g., you need to recognize string literals that contain comment-like text and ignore them), you are doing as well as the parsers would do anyway in terms of finding the comments. And if all you want to do is to replace them exactly with their content, echoing the source stream with the comment prefix/suffix /* */ stripped will do apparantly exactly what you want, so all this parsing machinery seems like overkill.

like image 74
Ira Baxter Avatar answered Sep 18 '22 15:09

Ira Baxter


You can already use Esprima to achieve what you want:

  1. Parse the code, get the comments (as an array).
  2. Iterate over the comments, see if each is what you are interested in.
  3. If you need to transform the comment, note its range. Collect all transformations.
  4. Apply the transformation back-to-first so that the ranges are not shifted.

The trick is here not change the AST. Simply apply the text change as if you are doing a typical search replace on the source string directly. Because the position of the replacement might shift, you need to collect everything and then do it from the last one. For an example on how to carry out such a transformation, take a look at my blog post "From double-quotes to single-quotes" (it deals with string quotes but the principle remains the same).

Last but not least, you might want to use a slightly higher-level utility such as Rocambole.

like image 45
Ariya Hidayat Avatar answered Sep 19 '22 15:09

Ariya Hidayat