Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where to begin with Distributed Computing / Parallel Processing? (Python / C) [closed]

I'm currently investigating topics for graduate studies in Computer Science and I've found a relatively large area of interest, Distributed Computing, that I'd like to get more information on. There are a handful of other questions [1,2,3] on StackOverflow that address similar matters, but not necessarily the question I'm going to ask, specifically related to the languages I'm looking for.

I've searched the web and found plenty of papers, articles, and even courses, such as this course from Rutgers, describing the theory and mechanics behind Distributed Computing. Unfortunately, most of these papers and courses I've found are fairly limited on describing the actual concepts of Distributed Computing in code. I'm looking for websites that can give me an introduction to the programming parts of Distributed Computing. (Preferably in C or Python.)

As a side note, I'd like to mention that this may even be more specifically towards how Parallel Computing fits into the field of Distributed Computing. (I haven't taken a course in either yet!)

like image 996
Alex Williams Avatar asked Aug 31 '12 01:08

Alex Williams


2 Answers

Disclamer: I am a developer of SCOOP.

It really depends on your personality. If you prefer getting theoretical information before moving forth, you should read some books or get along with the technologies first. A list of books covering a good part of the subject would be:

  • Parallel Programming for multicore and cluster systems by Thomas Rauber, and Gudula Rünger (Springer-Verlag).
  • Principles of Parallel Programming by Calvin Lin and Lawrence Snyder (Addison-Wesley)
  • Patterns for Parallel Programming by Timothy G. Mattson and al. (Addison-Wesley)

Data-based technologies you may want to get acquainted with would be the MPI standard (for multi-computers) and OpenMP (for single-computer), as well as the pretty good multiprocessing module which is builtin in Python.

If you prefer getting your hands dirty first, you should begin with task-based frameworks which provides a simple and user-friendly usage. Both of these were an utmost focus while creating SCOOP. You can try it with pip -U scoop. On Windows, you may wish to install PyZMQ first using their executable installers. You can check the provided examples and play with the various parameters to understand what causes performance degradation or increase with ease. I encourage you to compare it to its alternatives such as Celery for similar work or Gevent for a coroutine framework. If you feel adventurous, don't be shy to test the builtin coroutines functionnalities of Python and plug them with various networking stacks.

Using a task-based framework will ease you the burden of theoretical analysis such as load balancing implementation details, serialization and so on which is non-trivial and can take a long time to debug and get working. It provides all the desired level of understanding of distributed systems. Bonus with open source software: Check the code to understand under-the-hood mechanical details.

like image 161
Soravux Avatar answered Oct 26 '22 22:10

Soravux


I have had good experiences using the built in packages for python on a single machine. My friend has had great success using ipython on a machine with 128 cores.

Now there are different kinds of distributed computing like on clusters, clouds, or any machine on the internet like folding@home (including PS3s!) Don't forget about GPUs as well!

Some Python Links:
Various Python libraries
Ipython
Python and Parallel Computing presentation

like image 35
Onlyjus Avatar answered Oct 26 '22 22:10

Onlyjus