Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What to take into account for selecting a parallelization scheme?

I'm developing some code using c++ for my research in computational dynamics. My code solves sparse and dense matrices, generates meshes, and does similar operations in the most trivial sense. I needed to parallelize my code to reduce the computational time and used OpenMP for that purpose.

But after a closer look at the commercially available codes, like ANSYS CFX, I encountered that the parallelization scheme used in that software is MPICH2, which is an implementation of MPI.

So you have a lot of parallelization tools/API's :

  • OpenMP
  • MPI
  • Intel Threading Building Blocks
  • Pthreads
  • Microsoft PPL

I used some of these tools and managed to get 100% CPU usage in my local computer using each.

I don't know what criteria I should pay attention to while choosing the proper parallelization tool. What kind of applications require which tool? Is any of the above OK for research purposes? Which of them is used mostly in commerical softwares?

like image 485
Emre Turkoz Avatar asked Dec 26 '22 23:12

Emre Turkoz


1 Answers

As for many question of this type there is not a true definitive answer. You can't really say what's better because the answer is always "it depends". On what you're doing, on how your code is written, which your portability requirements are and so on.

Following your list:

  • OpenMP: is pretty standard and I found it's really easy to use. Even if original code has not been written with parallelization in mind this library makes a step by step approach very easy. I think it's a good entry point for parallel computing because it may make everything easy but it's hard to debug, limited in performance and it just makes code parallel (it lacks of parallel algorithms, structures, primitives and you can't span the work across a network).
  • Message Passing Interface: from my point of view a library based on this standard is best suited to span large computation across a cluster. If you have few computers and you want to make computation in parallel then this is a good choice, well known and stable. It's not (again in my point of view) a solution for local parallelization. If you're looking for a well-known, large used standard for grid computing then MPI is for you.
  • Intel Threading Building Blocks: this is a C++ library to unify the interface for multithreading across different environment (pthreads or the threading model of Windows). If you use a library like this maybe you need to be portable across compilers and environments. Moreover to use this library doesn't limit you so it can be well integrated with something else (for example MPI). You should take a look to the library to see if you like it, it's a very good choice with a good design, well documented and widely used.
  • Microsoft Parallel Patterns Library: this is a very big library. It's quite new so I do not feel secure to suggest someone to use it without a good test and moreover it's Microsoft specific so you're tied to its compiler. That said for what I see it's a great library. It abstracts a lot of details, it's well designed and it provides a very high level view of the concept of "parallel task". Again to use this library doesn't stop you to use, for example, MPI for clusters (but the Concurrency Runtime has its own library for this).

What to use? I do not have an answer, just try and pick what you feel more comfortable with (take a look to Boost Threads too). Please note that somehow you can mix them, for example OpenMP+MPI, MPI+TBB or even MPI+PLL). My preference is for PPL but if you're developing a real world application you may need a long test to decide what's better. Actually I like Concurrency Runtime (the base of PPL) because it's "horizontal", it provides a basic framework (with structures and algorithms) for parallel computing and a lot of "vertical" packages (Agents, PPL, TPL).

That said when you made your computation parallel you may need to improve performance of some CPU intensive routine. You may consider to use GPU for this task, I think it'll offer its best for short massive parallel computations (of course I prefer OpenCL over the proprietary CUDA even if CUDA performance may be higher). Actually you may even take a look to OpenHMPP if you're interested on this topic.

like image 66
Adriano Repetti Avatar answered Jan 14 '23 15:01

Adriano Repetti