I am looking for a framework to be used in a C++ distributed number crunching application.
The setup looks as follows:
There is a master node which divides the problem domain into small independent tasks. The tasks are distibuted to worker nodes of different capability (e.g. CPU type/GPU-enabled). Worker nodes are dynamically added to the compute grid, as they become available. It may also happen that a worker node dies, without saying good bye.
I am searching for a fast C/C++ framework to accomplish this setup.
To summarize, my main requirements are:
You can certainly do what you want with MPI. MPI-2 added dynamic process management features, and I think most of the currently widely-used implementations offer these.
One of the advantages of using C++ + MPI is that the combination is quite widely used in scientific and technical computing, though my impression is that within this niche dynamic process management is not used very much. Since MPI is used on the very largest supercomputers tackling the bleeding-edge problems of computational science, one might hazard a guess that it would be fast enough for your purposes.
One of the disadvantages of using C++ + MPI is that MPI was not designed to tolerate failure of processes during execution. There is debate on SO about whether or not the dynamic process management features allow you to program your own fault tolerance. But no debate that it might be difficult.
You would get the first 3 of your requirements 'out-of-the-box'. As for encrypted and authenticated communication, you'd have to do most of that yourself, MPI just passes messages around. I'd guess that for most MPI users, running parallel applications on clusters or supercomputers with private interconnects (often themselves isolated from corporate or enterprise networks), encryption and authentication are matters of little concern.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With