Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Theoretically, what data structure can I use for trees with shared memory?

Real world problem

I have a forest of trees. Like 20,000 trees. This forest occupies too much memory. But these trees are similar - you could find groups of trees (for ~200 trees) so that they have a common subtree of quite a significant size (tens of %).

Theory

So knowing that:

Trees are similar i.e. they share a common connected subgraph including the root (not necessarily including the leaves - but possibly).

Does there exist any data structure that allows for efficient storing of that information? Once the structure is created, I'm only interested in reading.

It doesn't necessarily be a solution tight to .NET, I could code it from scratch, I just need the idea :D But of course, if there is some little-known structure in .NET that kind of achieves that, I would be pleased to know.

I have a feeling that this shared memory stuff may have something to do with immutable structures that by definition are expected to share memory...

My trees are not binary search trees, unfortunately. They can have any amount of children.

Reading

As for reading, it is quite simple. I am always navigating from the root to a leaf. As you would in any JSON or XML, given an exact path to a value.

Nature of similarity

The connected subgraph including the root that is same (potentially) among two trees always contains the root and spans down. In some cases it is possible to even reach the leaves. See an example (the yellow part is the connected subgraph including the root):

enter image description here

Given these rules, mathematically speaking all the trees are similar - the connected subgraph is either empty, or it contains only the root, or inductively - it contains the root and its children...

like image 953
Patryk Golebiowski Avatar asked May 06 '16 06:05

Patryk Golebiowski


People also ask

Which data structure is used in trees?

A tree is a hierarchical data structure. Tree is a non-linear data structure which contains nodes and edges. Why Tree? Unlike Array and Linked List, which are linear data structures, tree is hierarchical (or non-linear) data structure.

Which data structure is the most efficient for tree creation?

Array is a good static data structure that can be accessed randomly and is fairly easy to implement. Linked Lists on the other hand is dynamic and is ideal for application that requires frequent operations such as add, delete, and update.

How can trees be represented in memory?

Linked representation Binary trees in linked representation are stored in the memory as linked lists. These lists have nodes that aren't stored at adjacent or neighboring memory locations and are linked to each other through the parent-child relationship associated with trees.

What are the two ways of representing binary trees in the memory?

Trees can be represented in two ways as listed below: Dynamic Node Representation (Linked Representation). Array Representation (Sequential Representation).


1 Answers

You can group children of your tree node by different "owners". When you add a node, you specify owner (or null to use default "shared" owner). When you traverse your tree, you also specify owner. Here is a sketch code:

class TreeNode {
    protected static readonly object SharedOwner = new object();
}

class TreeNode<T> : TreeNode {        
    private readonly T _data;
    private readonly Dictionary<object, List<TreeNode<T>>> _children;

    public TreeNode(T data) {
        this._data = data;
        _children = new Dictionary<object, List<TreeNode<T>>>();
    }

    public TreeNode<T> AddChild(T data, object owner = null) {
        if (owner == null)
            owner = SharedOwner;
        if (!_children.ContainsKey(owner))
            _children.Add(owner, new List<TreeNode<T>>());
        var added = new TreeNode<T>(data);
        _children[owner].Add(added);
        return added;
    }

    public void Traverse(Action<T> visitor, object owner = null) {
        TraverseRecursive(this, visitor, owner);
    }

    private void TraverseRecursive(TreeNode<T> node, Action<T> visitor, object owner = null) {
        visitor(node._data);
        // first traverse "shared" owner's nodes
        if (node._children.ContainsKey(SharedOwner)) {
            foreach (var sharedNode in node._children[SharedOwner]) {
                TraverseRecursive(sharedNode, visitor, owner);
            }
        }
        // then real owner's nodes
        if (owner != null && owner != SharedOwner && node._children.ContainsKey(owner)) {
            foreach (var localNode in node._children[owner]) {
                TraverseRecursive(localNode, visitor, owner);
            }
        }
    }
}

And a sample usage:

class Program {
    static void Main(string[] args) {
        // this is shared part
        var shared = new TreeNode<string>("1");
        var leaf1 = shared.AddChild("1.1").AddChild("1.1.1");
        var leaf2 = shared.AddChild("1.2").AddChild("1.2.1");
        var firstOwner = new object();
        var secondOwner = new object();
        // here we branch first time
        leaf1.AddChild("1.1.1.1", firstOwner);
        leaf2.AddChild("1.2.1.1", firstOwner);
        // and here another branch
        leaf1.AddChild("1.1.1.2", secondOwner);
        leaf2.AddChild("1.2.1.2", secondOwner);
        shared.Traverse(Console.WriteLine, firstOwner);
        shared.Traverse(Console.WriteLine, secondOwner);
        Console.ReadKey();
    }        
}
like image 85
Evk Avatar answered Oct 29 '22 09:10

Evk