Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed programming in Python [closed]

I plan do program a simple data flow framework, which basically consists of lazy method calls of objects. If I ever consider distributed programming, what is the easiest way to enable that in Python? Any transparent solution without me doing network programming?

Or for a start, how can I make use of multi-core processors in Python?

like image 591
Gerenuk Avatar asked Apr 22 '12 08:04

Gerenuk


People also ask

Can Python be used for distributed computing?

Distributed systems and Python Now it turns out that Python specifically has issues with performance when it comes to distributed systems because of its Global Interpreter Lock (GIL). This is basically the soft underbelly of Python that only allows for a single thread to be controlled by the interpreter at a time.

Which programming language is best for distributed systems?

Cloud storage and distributed systems turn to C++ because they are compatible with various machines and connect exceptionally well with the hardware. In addition, the C++ language provides much-needed high concurrency and load tolerance.

What is the purpose of distributed programming?

The goal of distributed computing is to make such a network work as a single computer. Distributed systems offer many benefits over centralized systems, including the following: Scalability. The system can easily be expanded by adding more machines as needed.


2 Answers

lazy method calls of objects

Can be anything at all really, so let's break it down:

Simple Let-Me-Call-That-Function (RPC)

Well lucky you! python has the one of greatest implementations of Remote Procedure Calls: RPyC.

Just run the server (double click a file, see the tutorial),

Open an interpreter and:

import rpyc
conn = rpyc.classic.connect("localhost")
data_obj = conn.modules.lazyme.AwesomeObject("ABCDE")
print(data_obj.calculate(10))

And a lazy version (async):

# wrap the remote function with async(), which turns the invocation asynchronous
acalc = rpyc.async(data_obj.calculate)
res = acalc(10)
print(res.ready, res.value)

Simple Data Distribution

You have a defined unit of work, say a complex image manipulation. What you do is roughly create Node(s), which does the actual work (aka, take an image, do the manipulation, and return the result), someone who collect the results (a Sink) and someone who create the work (the Distributor).

Take a look at Celery.

If it's very small scale, or if you just want to play with it, see the Pool object in the multiprocessing package:

from multiprocessing import Pool
p = Pool(5)
def f(x):
     return x*x
print(p.map(f, [1,2,3]))

And the truly-lazy version:

print(p.map_async(f, [1,2,3]))

Which returns a Result object which can be inspected for results.

Complex Data Distribution

Some multi-level more-than-just-fire&forget complex data manipulation, or a multi-step processing use case.

In such case, you should use a Message Broker such as ZeroMQ or RabbitMQ. They allow to you send 'messages' across multiple servers with great ease.

They save you from the horrors of the TCP land, but they are a bit more complex (some, like RabbitMQ, require a separate process/server for the Broker). However, they give you much more fine-grained control over the flow of data, and help you build a truly scalable application.

Lazy-Anything

While not data-distribution per se, It is the hottest trend in web server back-ends: use 'green' threads (or events, or coroutines) to delegate IO heavy tasks to a dedicated thread, while the application code is busy maxing-out the CPU.

I like Eventlet a lot, and gevent is another option.

like image 54
Ohad Avatar answered Sep 18 '22 03:09

Ohad


Try Gearman http://gearman.org/

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.

like image 44
Ankur Gupta Avatar answered Sep 18 '22 03:09

Ankur Gupta