Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel tree traversal in C#

I need to traverse a tree quickly, and I would like to do it in parallel. I'd rather use the parallel extensions than manually spin up a bunch of threads.

My current code looks something like this:

   public void Traverse(Node root)
    {
        var nodeQueue = new Queue<Node>();
        nodeQueue.Enqueue(root);
        while (nodeQueue.Count!=0)
        {
            var node = nodeQueue.Dequeue();
            if (node.Property = someValue) DoSomething(node);
            foreach (var node in node.Children)
            {
                nodeQueue.Enqueue(node);
            }
        }
    }

I was really hoping that Parallel.ForEach had a Parallel.While analog. I came across Stephen Toub's article on Implementing Parallel While with Parallel.ForEach. If read it correctly this still won't work because I am mutating the queue I am trying to iterate.

Do I need to use a task factory and recursion (and is that risky?) ? or is there some simple solution I am overlooking?

Edit: @svick

The tree has just over 250,000 nodes. The maximum depth right now is 14 nodes deep including the root.

There are about 500 nodes off the root, and the balance after that has a fairly random distribution. I'll get some better stats on the distribution soon.

@Enigmativity:

Yes, the tree is being modified, concurrently by many users, but I will usually have a shared read lock for the tree or sub tree, or allow for dirty reads.

Calls to node.Children can be considered atomic.

DoSomething is really one of several delegates, for some expensive operations I will probably gather a snapshot list of nodes and process them outside the traversal.

I realized that I should probably look at the general case (a sub-tree being traversed instead of the entire tree.) To that end I ran traverse on every node of the tree and looked at the total time.

I used a Parallel.ForEach(nodes, Traverse) for each traversal algorithm, where nodes contained all ~250k nodes. This simulated (sort of) a lot of users simultaneously requesting a lot of different nodes.

00256ms Breadth First Sequential

00323ms Breadth First Sequential with work (i incremented a static counter as "work")

01495ms Kirks First answer

01143ms Svicks Second answer

00000ms Recursive Single Threaded didn't finish after 60s

00000ms Enigmativity's answer didn't finish after 60s

@Enigma, I think it's possible I might have messed up your alogrithm somehow, because it seems like it should be much quicker.

The results surprised me to say the least. I had to add some work to the breadth first sequential just to convince myself that the compiler wasn't magically optimizing away the traversals.

For the single traversal of the head, parallelizing the first level only had the best performance. But just barely, this number improved as I added more nodes to the second level (2000 instead of 500).

like image 836
Jason Hernandez Avatar asked Aug 17 '11 21:08

Jason Hernandez


2 Answers

The most direct way would be to create a Task for each child node and then wait for all of them:

public void Traverse(Node root)
{
    if (node.Property == someValue)
        DoSomething(node);

    var tasks = new List<Task>();

    foreach (var node in node.Children)
    {
        // tmp is necessary because of the way closures close over loop variables
        var tmp = node;
        tasks.Add(Task.Factory.StartNew(() => Traverse(tmp)));
    }

    Task.WaitAll(tasks.ToArray());
}

Task is fairly light-weight, so creating lots of them works reasonably well. But they do have some overhead, so doing something more complicated like having a few tasks that share a queue is probably going to be faster. If that's the way you're going to go, don't forget that empty queue doesn't mean all work is done. Classes from the System.Collections.Concurrent namespace are going to come handy if you went this way.

EDIT: Because of the shape of the tree (the root has about 500 children), processing just the first level in parallel should give good performance:

public void Traverse(Node root, bool parallel = true)
{
    if (node.Property == someValue)
        DoSomething(node);

    if (parallel)
    {
        Parallel.ForEach(node.Children, node =>
        {
            Traverse(node, false);
        });
    }
    else
    {
        foreach (var node in node.Children)
        {
            Traverse(node, false);
        }
    }
}
like image 99
svick Avatar answered Sep 19 '22 12:09

svick


Since the traversal of the tree is extremely fast, that calls to Children are atomic, and that it is the expensive nature of the DoSomething delegates that need to be executed in parallel, here's my take on the solution.

I started with the idea that I needed a function that takes a node as a parameter, creates a task that executes DoSomething, recursively calls itself to create tasks for all of the children nodes, and finally returns a Task that waits for all of the internal tasks to be completed.

Here it is:

Func<Node, Task> createTask = null;
createTask = n =>
{
    var nt = Task.Factory.StartNew(() =>
    {
        if (n.Property == someValue)
            DoSomething(n);
    });
    var nts = (new [] { nt, })
        .Concat(n.Children.Select(cn => createTask(cn)))
        .ToArray();

    return Task.Factory.ContinueWhenAll(nts, ts => { });
};

All that is required to call it and wait for the traversal to complete is:

createTask(root).Wait();

I tested this by creating a tree of nodes with 500 children off of the root with 14 levels, with 1 or 2 subsequent children per node. This gave me a total of 319,501 nodes.

I created a DoSomething method that performed some work - for (var i = 0; i < 100000 ; i++) { }; - and then ran the above code and compared it to processing the same tree in series.

The parallel version took 5,151 ms. The sequential version 13,746 ms.

I also performed a test where I reduced the number of nodes to 3,196 and increased the processing time for DoSomething by 100x. The TPL very cleverly reverts to running sequentially if its tasks complete quickly so lengthening the processing time made the code run with more parallelism.

Now the parallel version took 3,203ms. The sequential version took 11,581ms. And, if I only called the createTask(root) function without waiting for it to complete it took just 126ms. This means that the tree is traversed very quickly, and it would then make sense to lock the tree during traversal and unlock it when processing is taking place.

I hope this helps.

like image 42
Enigmativity Avatar answered Sep 19 '22 12:09

Enigmativity