Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm's name - matching subtrees in ASTs

I have a set S of "small" trees S[i] which I need to find their positions for inside a bigger which are used as patterns to find matching subtrees in a bigger tree T. I know S before I start constructing T (which is a parse tree), so I am thinking about employing a cutting-plane method to match the nodes as I go (as the parser generates the CST).

The trees in S are not the same ASTs as T - think about XPath vs. XML - S holds the tree representation of the XPaths, while T is the actual AST of the source code - I need maps between i and a vector of matching nodes of T.

However I'm not sure about the names of the algorithms I would use.

Basically I know what I want to do, it feels like a "divide et impera for trees" with a stack where I hold possible candidates to match, at each shift of the LALR parser I duplicate the top of the stack and eliminate candidates i from S[i] which won't match anyway, and after a reduction I pop from the stack. At the beginning all members from S are possible candidates.

Please note: this is just about the AST, the ASG is another story...

Addendum

Here be a parse tree T.

T - the parse tree

The parsing function will be aware of a list of what I call "treepaths", in a canonical form, also represented as trees, stored in S. But they won't look like the parsetree, they have their own language to be represented in, similar to XPath.

Example of a treepath to get all functions which have as return value a variable:

function[body[return[expr[@type="variable"]]]]]
  1. So what should I look for in the existing literature?
  2. Any other advices?
  3. Are there already languages which can query meta-annotated trees like this? An open-source C (not C++) library would be ideal.
like image 519
Flavius Avatar asked Jun 16 '11 12:06

Flavius


1 Answers

1) Your S trees as XPath correspond to some T trees. Why not construct the T trees in advance, and then pattern match them?

2) If you want to match a pattern against a structure, you can imagine compiling the pattern to some kind of state machine, that transitions when given pieces of the tree being matched against. If the state machine ever enters an accept state, you've found a match. If you have more than one pattern, each one can treated as a state machine and you can run them "in parallel" (by simulation). To make this efficient, compute the cross product of all the state machines; now there's only one, and only one transition occurs per input. This idea I call "pattern products" and you see something like in a variety of efficient matchers. One close to what you want to do is the Rete Algorithm, which keeps track of which "patterns" are live as the data fed to it changes.

like image 85
Ira Baxter Avatar answered Sep 20 '22 13:09

Ira Baxter