I'm currently investigating topics for graduate studies in Computer Science and I've found a relatively large area of interest, Distributed Computing, that I'd like to get more information on. There are a handful of other questions [1,2,3] on StackOverflow that address similar matters, but not necessarily the question I'm going to ask, specifically related to the languages I'm looking for.
I've searched the web and found plenty of papers, articles, and even courses, such as this course from Rutgers, describing the theory and mechanics behind Distributed Computing. Unfortunately, most of these papers and courses I've found are fairly limited on describing the actual concepts of Distributed Computing in code. I'm looking for websites that can give me an introduction to the programming parts of Distributed Computing. (Preferably in C or Python.)
As a side note, I'd like to mention that this may even be more specifically towards how Parallel Computing fits into the field of Distributed Computing. (I haven't taken a course in either yet!)
Disclamer: I am a developer of SCOOP.
It really depends on your personality. If you prefer getting theoretical information before moving forth, you should read some books or get along with the technologies first. A list of books covering a good part of the subject would be:
Data-based technologies you may want to get acquainted with would be the MPI standard (for multi-computers) and OpenMP (for single-computer), as well as the pretty good multiprocessing module which is builtin in Python.
If you prefer getting your hands dirty first, you should begin with task-based frameworks which provides a simple and user-friendly usage. Both of these were an utmost focus while creating SCOOP. You can try it with pip -U scoop
. On Windows, you may wish to install PyZMQ first using their executable installers. You can check the provided examples and play with the various parameters to understand what causes performance degradation or increase with ease. I encourage you to compare it to its alternatives such as Celery for similar work or Gevent for a coroutine framework. If you feel adventurous, don't be shy to test the builtin coroutines functionnalities of Python and plug them with various networking stacks.
Using a task-based framework will ease you the burden of theoretical analysis such as load balancing implementation details, serialization and so on which is non-trivial and can take a long time to debug and get working. It provides all the desired level of understanding of distributed systems. Bonus with open source software: Check the code to understand under-the-hood mechanical details.
I have had good experiences using the built in packages for python on a single machine. My friend has had great success using ipython on a machine with 128 cores.
Now there are different kinds of distributed computing like on clusters, clouds, or any machine on the internet like folding@home (including PS3s!) Don't forget about GPUs as well!
Some Python Links:
Various Python libraries
Ipython
Python and Parallel Computing presentation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With