I'm looking for a good data structure to build equivalence classes on nodes of a tree. In an ideal structure, the following operations should be fast (O(1)/O(n) as appropriate) and easy (no paragraphs of mystery code):
- (A) Walk the tree from the root; on each node --> child transition enumerate all the equivalent versions of the child node
- (B) Merge two equivalence classes
- (C) Create new nodes from a list of existing nodes (the children) and other data
- (D) Find any nodes structurally equivalent to node (i.e. they have the same number of children, corresponding children belong to the same equivalence class, and their "other data" is equal) so that new (or newly modified) nodes may be put in the right equivalence class (via a merge)
So far I've considered (some of these could be used in combination):
- A parfait, where the children are references to collections of nodes instead of to nodes. (A) is fast, (B) requires walking the tree and updating nodes to point to the merged collection, (C) requires finding the collection containing each child of the new node, (D) requires walking the tree
- Maintaining a hash of nodes by their characteristics. This makes (D) much faster but (B) slower (since the hash would have to be updated when equivalence classes were merged)
- String the nodes together into a circular linked list. (A) is fast, (B) would be fast but for the fact that that "merging" part of a circular list with itself actually splits the list (C) would be fast, (D) would require walking the tree
- Like above, but with an additional "up" pointer in each node, which could be used to find a canonical member of the circular list.
Am I missing a sweet alternative?
You seem to have two forms of equivalence to deal with. Plain equivalence (A), tracked as equivalence classes which are kept up to date and structural equivalence (D), for which you occasionally go build a single equivalence class and then throw it away.
It sounds to me like the problem would be conceptually simpler if you maintain equivalence classes for both plain and structural equivalence. If that introduces too much churn for the structural equivalence, you could maintain equivalence classes for some aspects of structural equivalence. Then you could find a balance where you can afford the maintenance of those equivalence classes but still greatly reduce the number of nodes to examine when building a list of structurally equivalent nodes.