Strategy to implement tree traversing algorithm in parallel?

Tags:

I have implemented an iterative algorithm, where each iteration involves a pre-order tree traversal (sometimes called downwards accumulation) followed by a post-order tree traversal (upwards accumulation). Each visit to each node involves calculating and storing information to be used for the next visit (either in the subsequent post-order traversal, or the subsequent iteration).

During the pre-order traversal, each node can be processed independently as long as all nodes between it and the root have already been processed. After processing, each node needs to pass a tuple (specifically, two floats) to each of its children. On the post-order traversal, each node can be processed independently as long as all of it's subtrees (if any) have already been processed. After processing, each node needs to pass a single float to its parent.

The structure of the trees is static and unchanged during the algorithm. However, during the course of the downward traversal, if the two floats being passed both become zero, the entire subtree under this node does not need to be processed, and the upwards traversal for this node can begin. (The subtree must be preserved, because the passed floats on subsequent iterations may become non-zero at this node and traversals would resume).

The intensity of computation at each node is the same across the tree. The computation at each node is trivial: Just a few sums and multiply/divides on a list of numbers with length equal to the number of children at the node.

The trees being processed are unbalanced: a typical node would have 2 leaves plus 0-6 additional child nodes. So, simply partitioning the tree into a set of relatively balanced subtrees is non-obvious (to me). Further, the trees are designed to consume all available RAM: the bigger tree that I can process, the better.

My serial implementation attains on the order of 1000 iterations per second on just my little test trees; with the "real" trees, I expect it might slow by an order of magnitude (or more?). Given that the algorithm requires at least 100 million iterations (possibly up to a billion) to reach an acceptable result, I'd like to parallelize the algorithm to take advantage of multiple cores. I have zero experience with parallel programming.

What is the recommended pattern for parallelization given the nature of my algorithm?

755

asked Feb 09 '10 00:02

travis

2 Answers

Try to rewrite your algorithm to be composed of pure functions. That means that every piece of code is essentially a (small) static function with no dependence on global variables or static variables, and that all data is treated as immutable--- changes are only made to copies--- and all functions only manipulate state (in a loose sense of the word "manipulate") by returning (new) data.

If every function is referentially transparent--- it only depends on its input (and no hidden state) to compute its output, and every function call with the same input always yields the same output--- then you are in a good position to parallelize the algorithm: since your code never mutates global variables (or files, servers, etc.) the work a function does can be safely repeated (to recompute the function's result) or completely ignored (no future code depends on this function's side effects, so skipping a call completely won't break anything). Then when you run your suite of functions (for example on some implementation of MapReduce, hadoop, etc.) the chain of functions will cause a magical cascade of dependencies based solely on the output of one function and the input of another function, and WHAT you are trying to compute (via pure functions) will be completely separate from the ORDER in which you are trying to compute it (a question answered by the implementation of a scheduler for a framework like MapReduce).

A great place to learn this mode of thinking is write your algorithm in the programming language Haskell (or something F# or Ocaml) which has great support for parallel/multicore programming, out of the box. Haskell forces your code to be pure so if your algorithm works, it IS probably easily parallelizable.

answered Sep 19 '22 15:09

Jared Updike

The usual method is to use some kind of depth-first work-splitting. You start with a number of workers waiting on an idle queue, and one worker starting a traversal at the root. A worker with work traverses depth first, and whenever it is at a node with more than one child left to be done, it checks the idle worker queue and, if that's not empty, farms off a subtree (child) to another worker. There's some complication handling the joining when a worker finishes a subtree, but in general this can work well for a variety of tree structures (balanced or unbalanced)

answered Sep 22 '22 15:09

Chris Dodd

Related questions
                            
                                Is this strategy pattern?
                            
                                Which Design Pattern is this: Factory Method or Abstract Factory
                            
                                Design pattern for filtering objects
                            
                                Prototype vs Flyweight
                            
                                In Java, should I use getters or interface tagging for constant properties?
                            
                                Implementing Strategy pattern instead of several if statements
                            
                                Which design pattern to use for my use case?
                            
                                Should I use separate projects for bounded contexts in DDD .NET?
                            
                                How does the Command pattern decouple the sender from the receiver?
                            
                                If In Proxy Pattern we have interface instead of actual concrete Subject in Proxy class is it equivalent to Decorator Pattern
                            
                                Difference between Producer/Consumer pattern and Observer Pattern
                            
                                Scala override method with subclass as parameter type
                            
                                Design pattern for method returning different types/classes
                            
                                Design pattern for checking asynchronous task dependencies before execution [closed]
                            
                                How Can I Create method In Java With Same Type Parameter?
                            
                                Java design pattern for two classes sharing identical and similar but different methods
                            
                                How Do I Find Out If An Object Is Locked? c#
                            
                                Should mvc web applications be 3 tier?
                            
                                MVP - How many presenters
                            
                                Advanced javascript guidance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Strategy to implement tree traversing algorithm in parallel?

Tags:

algorithm

design-patterns

tree

parallel-processing

tree-traversal

travis

People also ask

2 Answers

Jared Updike

Chris Dodd

Recent Activity

Donate For Us