Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two ASTs?

I'm trying to implement a tool for merging different versions of some source code. Given two versions of the same source code, the idea would be to parse them, generate the respective Abstract Source Trees (AST), and finally merge them into a single output source keeping grammatical consistency - the lexer and parser are those of question ANTLR: How to skip multiline comments.

I know there is class ParserRuleReturnScope that helps... but getStop() and getStart() always return null :-(

Here is a snippet that illustrates how I modified my perser to get rules printed:

parser grammar CodeTableParser;

options {
    tokenVocab = CodeTableLexer;
    backtrack = true;
    output = AST;
}

@header {
    package ch.bsource.ice.parsers;
}

@members {
    private void log(ParserRuleReturnScope rule) {
        System.out.println("Rule: " + rule.getClass().getName());
        System.out.println("    getStart(): " + rule.getStart());
        System.out.println("    getStop(): " + rule.getStop());
        System.out.println("    getTree(): " + rule.getTree());
    }
}

parse
    : codeTabHeader codeTable endCodeTable eof { log(retval); }
    ;

codeTabHeader
    : comment CodeTabHeader^ { log(retval); }
    ;

...
like image 337
j3d Avatar asked Nov 12 '22 20:11

j3d


1 Answers

Assuming you have the ASTs (often difficult to get in the first place, parsing real languages is often harder than it looks), you first have to determine what they have in common, and build a mapping collecting that information. That's not as easy as it looks; do you count a block of code that has moved, but is the same exact subtree, as "common"? What about two subtrees that are the same except for consistent renaming of an identifier? What about changed comments? (most ASTs lose the comments; most programmers will think this is a really bad idea).

You can build a variation of the "Longest Common Substring" algorithm to compare trees. I've used that in tools that I have built.

Finally, after you've merged the trees, now you need to regenerate the text, ideally preserving most of the layout of the original code. (Programmers hate when you change the layout they so loving produced). So your ASTs need to capture position information, and your regeneration has to honor that where it can.

like image 65
Ira Baxter Avatar answered Nov 15 '22 10:11

Ira Baxter