Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for tree structures?

Are there regular expression equivalents for searching and modifying tree structures? Concise mini-languages (like perl regex) are what I am looking for.

Here is an example that might clarify what I am looking for.

<root>
  <node name="1">
    subtrees ....
  </node>
  <node name="2">
    <node name="2.1">
     data
    </node>
    other subtrees...
  </node>
</root>

An operation that would be possible on the above tree is "move subtree at node 2.1 into the subtree at node 1." The result of the operation might look something like..

<root>
  <node name="1">
    subtrees ....
    <node name="2.1">
     data
    </node>
  </node>
  <node name="2">
    other subtrees...
  </node>
</root>

Search and replace operations like find all nodes with atleast 2 children, find all nodes whose data starts with "a" and replace it with "b" if the subtrees have atleast 2 other siblings, etc. should be supported.

For strings, where the only dimension is across the length of the string, we can do many of above operations (or their 1D equivalents) using regular expressions. I wonder if there are equivalents for trees. (instead of a single regex, you might need to write a set of transformation rules, but that is ok).

I would like to know if there is some simple mini language (not regex per.se, but something that is as accessible as regex via libraries, etc..). to perform these operations? Preferably, as a python library.

like image 670
JSN Avatar asked May 17 '09 17:05

JSN


3 Answers

TSurgeon and Tregex from Stanford is capable of doing that. You can download the library from http://nlp.stanford.edu/software/tregex.shtml

like image 61
Volkan Agun Avatar answered Nov 16 '22 22:11

Volkan Agun


I don't know a general purpose langugae that can do that, but it seems to me that you are looking for something like XPath.

like image 5
Daniel Brückner Avatar answered Nov 16 '22 23:11

Daniel Brückner


There is TXL for pattern based tree rewriting.

Tree rewriting with patterns is also done with parser toolkits such as ANTLR

Code generation with bottom up tree rewriting, google BURS or BURG.

like image 5
Doug Currie Avatar answered Nov 16 '22 22:11

Doug Currie