Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple non-network concurrency with Twisted

I have a problem with using Twisted for simple concurrency in python. The problem is - I don't know how to do it and all online resources are about Twisted networking abilities. So I am turning to SO-gurus for some guidance.

Python 2.5 is used.

Simplified version of my problem runs as follows:

  1. A bunch of scientific data
  2. A function that munches on the data and creates output
  3. ??? < here enters concurrency, it takes chunks of data from 1 and feeds it to 2
  4. Output from 3 is joined and stored

My guess is that Twisted reactor can do the number three job. But how?

Thanks a lot for any help and suggestions.

upd1:

Simple example code. No idea how reactor deals with processes, so I have given it imaginary functions:

datum = 'abcdefg'

def dataServer(data):
    for char in data:
        yield chara

def dataWorker(chara):
    return ord(chara)

r = reactor()
NUMBER_OF_PROCESSES_AV = 4
serv = dataserver(datum)
id = 0
result = array(len(datum))

while r.working():
    if NUMBER_OF_PROCESSES_AV > 0:
        r.addTask(dataWorker(serv.next(), id)
        NUMBER_OF_PROCESSES_AV -= 1
        id += 1
    for pr, id in r.finishedProcesses():
        result[id] = pr
like image 772
Rince Avatar asked Dec 29 '22 21:12

Rince


2 Answers

As Jean-Paul said, Twisted is great for coordinating multiple processes. However, unless you need to use Twisted, and simply need a distributed processing pool, there are possibly better suited tools out there.

One I can think of which hasn't been mentioned is celery. Celery is a distributed task queue - you set up a queue of tasks running a DB, Redis or RabbitMQ (you can choose from a number of free software options), and write a number of compute tasks. These can be arbitrary scientific computing type tasks. Tasks can spawn subtasks (implementing your "joining" step you mention above). You then start as many workers as you need and compute away.

I'm a heavy user of Twisted and Celery, so in any case, both options are good.

like image 78
rlotun Avatar answered Dec 31 '22 12:12

rlotun


To actually compute things concurrently, you'll probably need to employ multiple Python processes. A single Python process can interleave calculations, but it won't execute them in parallel (with a few exceptions).

Twisted is a good way to coordinate these multiple processes and collect their results. One library oriented towards solving this task is Ampoule. You can find more information about Ampoule on its Launchpad page: https://launchpad.net/ampoule.

like image 32
Jean-Paul Calderone Avatar answered Dec 31 '22 11:12

Jean-Paul Calderone