Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MPI vs GPU vs Hadoop, what are the major difference between these three parallelism?

Tags:

I know for some machine learning algorithm like random forest, which are by nature should be implemented in parallel. I do a home work and find there are these three parallel programming framework, so I am interested in knowing what are the major difference between these three types of parallelism?

Especially, if some one can point me to some study compare the difference between them, that will be perfect!

Please list the pros and cons for each parallelism , thanks

like image 298
user974270 Avatar asked Apr 19 '12 21:04

user974270


People also ask

How is parallelism achieved in GPU?

Data parallelism means that each GPU uses the same model to trains on different data subset. In data parallel, there is no synchronization between GPUs in forward computing, because each GPU has a fully copy of the model, including the deep net structure and parameters.

Does Hadoop use MPI?

Perhaps hadoop is not using MPI because MPI usually requires coding in C or Fortran and has a more scientific/academic developer culture, while hadoop seems to be more driven by IT professionals with a strong Java bias. MPI is very low-level and error-prone. It allows very efficient use of hardware, RAM and network.

Is MapReduce parallel processing?

MapReduce is an attractive model for parallel data processing in high- performance cluster computing environments. The scalability of MapReduce is proven to be high, because a job in the MapReduce model is partitioned into numerous small tasks running on multiple machines in a large-scale cluster.


1 Answers

  1. MPI is a message passing paradigm of parallelism. Here, you have a root machine which spawns programs on all the machines in its MPI WORLD. All the threads in the system are independent and hence the only way of communication between them is through messages over network. The network bandwidth and throughput is one of the most crucial factor in MPI implementation's performance. Idea : If there is just one thread per machine and you have many cores on it, you can use OpenMP shared memory paradigm for solving subsets of your problem on one machine.

  2. CUDA is a SMT paradigm of parallelism. It uses state of the art GPU architecture to provide parallelisim. A GPU contains (blocks of ( set of cores)) working on same instruction in a lock-step fashion (This is similar to SIMD model). Hence, if all the threads in your system do a lot of same work, you can use CUDA. But the amount of shared memory and global memory in a GPU are limited and hence you should not use just one GPU for solving a huge problem.

  3. Hadoop is used for solving large problems on commodity hardware using Map Reduce paradigm. Hence, you do not have to worry about distributing data or managing corner cases. Hadoop also provides a file system HDFS for storing data on compute nodes.




Hadoop, MPI and CUDA are completely orthogonal to each other. Hence, it may not be fair to compare them.

Though, you can always use ( CUDA + MPI ) to solve a problem using a cluster of GPU's. You still need a simple core to perform the communication part of the problem.

like image 119
prathmesh.kallurkar Avatar answered Dec 17 '22 03:12

prathmesh.kallurkar