Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Higher-level, semantic search-and-replace in Java code from command-line

Command-line tools like grep, sed, awk, and perl allow one to carry out textual search-and-replace operations.

However, is there any tool that would allow me to carry out semantic search-and-replace operations in a Java codebase, from command-line?

The Eclipse IDE allows me, e.g., to easily rename a variable, a field, a method, or a class. But I would like to be able to do the same from command-line.

The rename operation above is just one example. I would further like to be able to select the replacee text with additional semantic constraints such as:

  • only the scopes of methods M1, M2 of classes C, D, and E;
  • only all variables or fields of class C;
  • all expressions in which a variable of some class occurs;
  • only the scope of the class definition of a variable;
  • only the scopes of all overridden versions of method M of class C;
  • etc.

Having selected the code using such arbitrary semantic constraints, I would like to be able to then carry out arbitrary transformations on it.

So, basically, I would need access to the symbol-table of the code.

Question:

  1. Is there an existing tool for this type of work, or would I have to build one myself?
  2. Even if I have to build one myself, do any tools or libraries exist that would at least provide me the symbol-table of Java code, on top of which I could add my own search-and-replace and other refactoring operations?
like image 841
Harry Avatar asked Jun 19 '16 13:06

Harry


2 Answers

The only tool that I know can do this easily is the long awaited Refaster. However it is still impossible to use it outside of Google. See [the research paper](http:// research.google.com/pubs/pub41876.html) and status on using Refaster outside of Google.

I am the author of AutoRefactor, and I am very interested in implementing this feature as part of this project. Please follow up on the github issue if you would like to help.

like image 104
JnRouvignac Avatar answered Oct 16 '22 06:10

JnRouvignac


What you want is the ability to find code according to syntax, constrained by various semantic conditions, and then be able to replace the found code with new syntax.

access to the symbol table (symbol type/scope/mentions in scope) is just one kind of semantic constraint. You'll probably want others, such as control flow sequencing (this happens after that) and data flow reaching (data produced here is consumed there). In fact there are an unbounded number of semantic conditions you might consider important, depending on the properties of the language (does this function access data in parallel to that function?) or your application interests (is this matrix an upper triangular matrix?)

In general you can't have a tool that has all possible semantic conditions of interest off the shelf. That means you need to be to express new semantic conditions when you discover the need for them.

The best you might hope for is a tool that

  • knows the language syntax
  • has some standard semantic properties built in (my preference is symbol tables, control and data flow analysis)
  • can express patterns on the source in terms of the source code
  • can constrain the patterns based on such semantic properties
  • can be extended with new semantic analyses to provide additional properties

There is a classic category of tools that do this, call source to source program transformation systems.

My company offers the DMS Software Reengineering Toolkit, which is one of these. DMS has been used to carry out production transformations at scale on a wide variety of languages (including OP's target: Java). DMS's rewrite rules are of the form:

 rule <rule_name>(syntax_parameters): syntax_category =
    <match_pattern> ->  <replacement_pattern>
    if  <semantic_condition>;

You can see a lot more detail of the pattern language and rewrite rules look like: DMS Rewrite Rules.

It is worth noting that the rewrite rules represent operations on trees. This means that while they might look like text string matches, they are not. Consequently a rewrite rule matches in spite of any whitespace issues (and in DMS's case, even in spite of differences in number radix or character string escapes). This makes the DMS pattern matches far more effective than a regex, and a lot easier to write since you don't have worry about these issues.

This Software Recommendations link shows how one can define rules with DMS, and (as per OP's request) "run them from the command line": This isn't as succinct as running SED, but then it is doing much more complex tasks.

DMS has a Java front with symbol tables, control and data flow analysis. If one wants additional semantic analyses, one codes them in DMS's underlying programming language.

like image 20
Ira Baxter Avatar answered Oct 16 '22 05:10

Ira Baxter