I'm trying to implement a tree-processing algorithm in Haskell, and (due to the fact this is my first Haskell program!), am struggling with the design of the data structures. Can any of the FP gurus out there lend a hand?
I'll start by describing the important characteristics of the algorithm, sketch out how I would approach this using an imperative language, and finish with the stumbling baby-steps I have made so far in Haskell.
I won't describe the full algorithm in detail, but the salient points are as follows:
The data structures must therefore have the following characteristics:
If I was to implement this algorithm using an imperative language, the solution would look something like the following.
Let's assume that the starting point is the following definition of the input tree:
struct node {
// Identifier for this node, unique within the containing tree
size_t id;
// Label of this node
enum label label;
// Attributes of this node
// An attribute can be assumed to be a key-value pair
// Details of the attributes themselves aren't material to this
// discussion, so the "attribute" type is left opaque
struct attribute **attributes;
size_t n_attributes;
// Pointer to parent of this node
// NULL iff this node is root
struct node *parent;
// Pointer to first child of this node
// NULL iff this node is leaf
struct node *child;
// Double-linked list of siblings of this node
struct node *prev;
struct node *next;
};
The pointers embedded in each node clearly support the up / down / left / right traversals required by the algorithm.
Annotation can be implemented by defining the following structure:
struct algo_node {
// Pointer to input node which has been wrapped
struct node *node;
// Derived properties computed by first phase of the algorithm
// Details of the properties themselves aren't material to this
// discussion, so the "derived" type is left opaque
struct derived props;
// Pointer to corresponding node in the other tree
// NULL iff this node is unmatched
struct node *match;
};
The first phase of the algorithm constructs an algo_node for each node in each of the input trees.
Mapping from an algo_node to a node is trivial: follow the embedded *node pointer. Mapping in the other direction can be supported by storing algo_nodes in an array, indexed by the id of the input node.
This is of course just one possible implementation. Many variations are possible, including
list or queue interface, rather than storing three raw pointersstruct algo_nodeLet's start with the following definition of the input tree:
data Tree = Leaf Label Attributes
| Node Label Attributes [Tree]
Augmentation of each node with an id can be achieved as follows:
data AnnotatedTree = Tree Int
addIndex :: Int -> AnnotatedTree -> (AnnotatedTree, Int)
indexedTree = addIndex 0 tree
Similarly, we can write a function which computes derived properties:
data AnnotatedTree = Tree DerivedProperties
computeDerived :: DerivedProperties -> AnnotatedTree -> (AnnotatedTree, DerivedProperties)
derivedTree = computeDerived DefaultDerived tree
The above snippets can be adjusted with little work, such that AnnotatedTree contains both the index and the derived properties.
However, I don't know where to begin with representing mappings between the two trees. Based on some reading, I have some half-baked ideas ...
AnnotatedTree to contain a path from the root of the other tree to the mapped node - encoded as a list of indices into each successive children list, [Integer]
AnnotatedTree to contain a reference to the mapped node directly, as a Maybe Tree
... but I could really do with some guidance on which (if any) of these are worth pursuing.
Any help would be much appreciated!
You can label the tree nodes with Int id's, and walk around them with zippers (using Data.Tree and Data.Tree.Zipper is a good idea because there's no need to reinvent the wheel). You can then attach auxiliary attributes to the nodes using a Data.IntMap to map node id's to whatever you want. In particular, you can create an IntMap to map from a node id to the TreePos Full Int for that node so you can explore the parent, siblings, and children of that node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With