I'm half-way through implementing parallel depth-first search algorithm in MPI and I'm thinking about trying to also do it in CUDA / OpenCL, just for fun / out of curiosity. The algorithm is simple but not trivial. The single-core version in C is about 200 lines of code.
How much is GPGPU suitable for this kind of problem?
Tree search operations are not so simple to be implemented in CUDA. There are some papers, like the one
And another rather simple implementation (not quite a massively parallelized implementation in my opinion)
The difficulty comes from the fact that, tree operations generally involve decision making and according to the decisions different branches are taken. So massively parallelizing the operations without overlapping and making redundant operations is quite hard.
There are some approaches which use Stack and Queue implementations to traverse Trees.
You may find a similar question in here: Error: BFS on CUDA Synchronization
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With