I have a forest of trees. Like 20,000 trees. This forest occupies too much memory. But these trees are similar - you could find groups of trees (for ~200 trees) so that they have a common subtree of quite a significant size (tens of %).
So knowing that:
Trees are similar i.e. they share a common connected subgraph including the root (not necessarily including the leaves - but possibly).
Does there exist any data structure that allows for efficient storing of that information? Once the structure is created, I'm only interested in reading.
It doesn't necessarily be a solution tight to .NET, I could code it from scratch, I just need the idea :D But of course, if there is some little-known structure in .NET that kind of achieves that, I would be pleased to know.
I have a feeling that this shared memory stuff may have something to do with immutable structures that by definition are expected to share memory...
My trees are not binary search trees, unfortunately. They can have any amount of children.
As for reading, it is quite simple. I am always navigating from the root to a leaf. As you would in any JSON or XML, given an exact path to a value.
The connected subgraph including the root that is same (potentially) among two trees always contains the root and spans down. In some cases it is possible to even reach the leaves. See an example (the yellow part is the connected subgraph including the root):
Given these rules, mathematically speaking all the trees are similar - the connected subgraph is either empty, or it contains only the root, or inductively - it contains the root and its children...
A tree is a hierarchical data structure. Tree is a non-linear data structure which contains nodes and edges. Why Tree? Unlike Array and Linked List, which are linear data structures, tree is hierarchical (or non-linear) data structure.
Array is a good static data structure that can be accessed randomly and is fairly easy to implement. Linked Lists on the other hand is dynamic and is ideal for application that requires frequent operations such as add, delete, and update.
Linked representation Binary trees in linked representation are stored in the memory as linked lists. These lists have nodes that aren't stored at adjacent or neighboring memory locations and are linked to each other through the parent-child relationship associated with trees.
Trees can be represented in two ways as listed below: Dynamic Node Representation (Linked Representation). Array Representation (Sequential Representation).
You can group children of your tree node by different "owners". When you add a node, you specify owner (or null to use default "shared" owner). When you traverse your tree, you also specify owner. Here is a sketch code:
class TreeNode {
protected static readonly object SharedOwner = new object();
}
class TreeNode<T> : TreeNode {
private readonly T _data;
private readonly Dictionary<object, List<TreeNode<T>>> _children;
public TreeNode(T data) {
this._data = data;
_children = new Dictionary<object, List<TreeNode<T>>>();
}
public TreeNode<T> AddChild(T data, object owner = null) {
if (owner == null)
owner = SharedOwner;
if (!_children.ContainsKey(owner))
_children.Add(owner, new List<TreeNode<T>>());
var added = new TreeNode<T>(data);
_children[owner].Add(added);
return added;
}
public void Traverse(Action<T> visitor, object owner = null) {
TraverseRecursive(this, visitor, owner);
}
private void TraverseRecursive(TreeNode<T> node, Action<T> visitor, object owner = null) {
visitor(node._data);
// first traverse "shared" owner's nodes
if (node._children.ContainsKey(SharedOwner)) {
foreach (var sharedNode in node._children[SharedOwner]) {
TraverseRecursive(sharedNode, visitor, owner);
}
}
// then real owner's nodes
if (owner != null && owner != SharedOwner && node._children.ContainsKey(owner)) {
foreach (var localNode in node._children[owner]) {
TraverseRecursive(localNode, visitor, owner);
}
}
}
}
And a sample usage:
class Program {
static void Main(string[] args) {
// this is shared part
var shared = new TreeNode<string>("1");
var leaf1 = shared.AddChild("1.1").AddChild("1.1.1");
var leaf2 = shared.AddChild("1.2").AddChild("1.2.1");
var firstOwner = new object();
var secondOwner = new object();
// here we branch first time
leaf1.AddChild("1.1.1.1", firstOwner);
leaf2.AddChild("1.2.1.1", firstOwner);
// and here another branch
leaf1.AddChild("1.1.1.2", secondOwner);
leaf2.AddChild("1.2.1.2", secondOwner);
shared.Traverse(Console.WriteLine, firstOwner);
shared.Traverse(Console.WriteLine, secondOwner);
Console.ReadKey();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With