I have some old software (in a language that's not dead but is dead to me ;-)) that implements a basic pattern-matching and -rewriting system for source code. I am considering resurrecting this code, translating it into a modern language, and open-sourcing the project as a refactoring power-tool. Before I go much further, I want to know if anything like this exists already (my google-fu is fanning air on this tonight).
Here's how it works:
the software operates on the abstract syntax tree (AST) of the input application, and outputs a modified AST which can then be regenerated into new source code
for example, suppose we find a bunch of while-loops that really should be for-loops. The following template will match the while-loop pattern:
Template oldLoopPtrn
int @cnt@ = 0;
while (@cnt@ < @max@)
{
… @body@
++@cnt@;
}
End_Template
while the following template will specify the output rewrite pattern:
Template newLoopPtrn
for(int @cnt@ = 0; @cnt@ < @max@; @cnt@++)
{
@body@
}
End_Template
and a simple rule to associate them
Rule oldLoopPtrn --> newLoopPtrn
so code that looks like this
int i=0;
while(i<arrlen)
{
printf("element %d: %f\n",i,arr[i]);
++i;
}
gets automatically rewritten to look like this
for(int i = 0; i < arrlen; i++)
{
printf("element %d: %f\n",i,arr[i]);
}
The closest thing I've seen like this is some of the code-refactoring tools, but they seem to be geared towards interactive rewriting of selected snippets, not wholesale automated changes.
I believe that this kind of tool could supercharge refactoring, and would work on multiple languages (even HTML/CSS). I also believe that converting and polishing the code base would be a huge project that I simply cannot do alone in any reasonable amount of time.
So, anything like this out there already? If not, any obvious features (besides rewrite-rule conditions) to consider?
EDIT: The one feature of this system that I like very much is that the template patterns are fairly obvious and easy to read because they're written in the same language as the target source code, not in some esoteric mutated regex/BNF format.
I am considering resurrecting this code, translating it into a modern language, and open-sourcing the project as a refactoring power-tool.
I think it would be simply great to have such a tool freely available.
But there is a commercial product available: DMS Software Reengineering Toolkit.
The DMS Software Reengineering Toolkit is a set of tools for automating customized source program analysis, modification or translation or generation of software systems, containing arbitrary mixtures of languages ("domains"). The term "software" for DMS is very broad and covers any formal notation, including programming languages, markup languages, hardware description languages, design notations, data descriptions, etc. This toolkit is the first step towards the implementation of The Design Maintenance System™, an ambitious vision of a 21st Century software engineering environment that supports the incremental construction and maintenance of large application systems, driven by semantics and captured designs.
Also, for C source code there is the coccinelle tool:
Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code. Coccinelle was initially targeted towards performing collateral evolutions in Linux. Such evolutions comprise the changes that are needed in client code in response to evolutions in library APIs, and may include modifications such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure. Beyond collateral evolutions, Coccinelle is successfully used (by us and others) for finding and fixing bugs in systems code.
TXL is rule based rather than pattern based, so it has more capabilities but perhaps a steeper learning curve.
A Coccinelle implementation of the above rule would be:
@@ identifier cnt; expression max,E; @@ cnt = 0 ... when != cnt=E -while (cnt < max) +for (cnt=0; cnt < max; cnt++) { ... - cnt++; }
For the C code:
int main () { int i=0; printf ("hello\n"); while(i < arrlen) { printf("element %d: %f\n",i,arr[i]); ++i; } }
it gives
int main () { int i=0; printf ("hello\n"); for (i = 0; i < arrlen; i++) { printf("element %d: %f\n",i,arr[i]); } }
The ... when != cnt=E allows arbitrary code between the initialization of cnt and the while loop, but checks that cnt is not redefined. A more elaborate rule could also get rid of the initialization of cnt, if it is not used between the initialization and the while loop.
Our DMS has already been mentioned. It has transformation rules that are, as the OP put it, "easy to read because they're written in the same language as the target source code".
Here's a link that shows a complete, detailed example of pattern-matching/transformation using DMS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With