Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hierarchy traversal and comparison modules for Python?

I deal with a lot of hierarchies in my day to day development. File systems, nested DAG nodes in Autodesk Maya, etc.

I'm wondering, are there any good modules for Python specifically designed to traverse and compare hierarchies of objects?

Of particular interest would be ways to do 'fuzzy' comparisons between two nearly identical hierarchies. Some of the reasons for doing this would be for matching two node hierarchies in Maya from two different characters in order to transfer animation from one to the other.

Based on what I've been reading, I'd probably need something with a name threshold (which I could build myself) for comparing how close two node names are to each other. I'd then need a way to optionally ignore the order that child nodes appear in the hierarchy. Lastly, I'd need to deal with a depth threshold, in cases where a node may have been slightly moved up or down the hierarchy.

like image 760
Soviut Avatar asked Mar 20 '09 03:03

Soviut


3 Answers

I'm not sure I see the need for a complete module -- hierarchies are a design pattern, and each hierarchy has enough unique features that it's hard to generalize.

class Node( object ):
    def __init__( self, myData, children=None )
        self.myData= myData
        self.children= children if children is not None else []
    def visit( self, aVisitor ):
        aVisitor.at( self )
        aVisitor.down()
        for c in self.children:
            aVisitor.at( c )
        aVisitor.up()

class Visitor( object ):
    def __init__( self ):
        self.depth= 0
    def down( self ):
        self.depth += 1
    def up( self ):
        self.depth -= 1

I find that this is all I need. And I've found that it's hard to make a reusable module out of this because (a) there's so little here and (b) each application adds or changes so much code.

Further, I find that the most commonly used hierarchy is the file system, for which I have the os module. The second most commonly used hierarchy is XML messages, for which I have ElementTree (usually via lxml). After those two, I use the above structures as templates for my classes, not as a literal reusable module.

like image 137
S.Lott Avatar answered Oct 13 '22 16:10

S.Lott


I recommend digging around xmldifff http://www.logilab.org/859 and seeing how they compare nodes and handle parallel trees. Or, try writing a [recursive] generator that yields each [significant] node in a tree, say f(t), then use itertools.izip(f(t1),f(t2)) to collect together pairs of nodes for comparison.

Most of the hierarchical structures I deal with have more than one "axis", like elements and attributes in XML, and some nodes are more significant than others.

For a more bizarre solution, serialize the two trees to text files, make a referential note that line #n comes from node #x in a tree. Do that to both trees, feed the files into diff, and scan the results to notice which parts of the tree have changed. You can map that line #n from file 1 (and therefore node #x in the first tree) and line #m from file 2 (and therefore node #y of the second tree) mean that some part of each tree is the same or different.

For any solution your are going to have to establish a "canonical form" of your tree, one that might drop all ignorable whitespace, display attributes, optional nodes, etc., from the comparison process. It might also mean doing a breadth first vs. depth first traversal of the tree(s).

like image 45
Joel Avatar answered Oct 13 '22 16:10

Joel


http://code.google.com/p/pytree/

these maybe overkill or not suited at all for what you need:

http://networkx.lanl.gov/

http://www.osl.iu.edu/~dgregor/bgl-python/

like image 1
Vasil Avatar answered Oct 13 '22 18:10

Vasil