I need to write a scientific application in C++ doing a lot of computations and using a lot of memory. I have part of the job but due to high requirements in terms of resources I was thinking to start moving to OpenMPI.
Before doing that I have a simple curiosity: If I understood the principle of OpenMPI correctly it is the developer that has the task of splitting the jobs over different nodes calling SEND and RECEIVE based on node available at that time.
Do you know if it does exist some library or OS or whatever that has this capability letting my code reamain as it is now? Basically something that connects all computers and let share as one their memory and CPU?
I am a bit confused because of the huge volume of material available on the topic. Should I look at cloud computing? or Distributed Shared Memory?
Currently there is no C++ library or utility that will allow you to automatically parallelize your code across a cluster of machines. Granted that there are a lot of ways to achieve distributed computing with other approaches, you really want to be optimizing your application to use message passing or distributed shared memory.
Your best bets would be to:
Implementing a parallel distributed solution is one thing, making it work efficiently is another though. Read up on different topologies and different parallel computing patterns to make implementing solutions a little less painful than if you had to start from scratch.
Well, you haven't actually stated exactly what the hardware you are targetting is, if it's a shared-memory machine then OpenMP is an option. Most parallel programmers would regard parallelisation with OpenMP as an easier option than using MPI in any of its incarnations. I'd also suggest that it is easier to retrofit OpenMP to an existing code than MPI. The best, in the sense of best-performing, MPI programs are those designed from the ground up to be parallelised with message-passing.
In addition, the best sequential algorithm might not always be the most efficient algorithm, once it has been parallelised. Sometimes a simple, but sequentially-sub-optimal algorithm is a better choice.
You may have access to a shared-memory computer:
If you are stuck with message-passing then I'd strongly advise you to stick with C++ and OpenMPI (or whatever MPI is already installed on your system), and you should definitely look at BoostMPI too. I advise this strongly because, once you step outside the mainstream of high-performance scientific computing, you may find yourself in an army of one programming with an idiosyncratic collection of just-fit-for-research libraries and other tools. C++, OpenMPI and Boost are sufficiently well used that you can regard them as being of 'weapons-grade' or whatever your preferred analogy might be. There's little enough traffic on SO, for example, on MPI and OpenMP, check out the stats on the other technologies before you bet the farm on them.
If you have no experience with MPI then you might want to look at a book called Parallel Scientific Computing in C++ and MPI by Karniadakis and Kirby. Using MPI by Gropp et al is OK as a reference, but it's not a beginner's text on programming for message-passing.
If message passing is holding you down, try distributed objects. There are a lot of distributed object frameworks available. CORBA, DCOM, ICE to name a few... If you choose to distribute your objects, your objects will have global visibility through the interfaces(both data and methods) you will define. Any object in any node can access these distributed objects.
I have been searching for software that allows distributing memory, but haven't come across any. I guess its because you have all these distributed object frameworks available, and people don't have any need for distributing memory as such.
I had a good experience using Top-C in graduate school.
From the home page: "TOP-C especially distinguishes itself as a package to easily parallelize existing sequential applications."
http://www.ccs.neu.edu/home/gene/topc.html
Edit: I should add, it's much simpler to parallelize a program if it uses "trivial parallelism". e.g. Nodes don't need to share memory. Mapreduce is built on this concept. If you can minimize the amount of shared state your nodes use, you'll see orders of magnitude better improvements from parallel processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With