Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform a for-loop in parallel in Python 3.2 [duplicate]

Possible Duplicate:
how do I parallelize a simple python loop?

I'm quite new to Python (using Python 3.2) and I have a question concerning parallelisation. I have a for-loop that I wish to execute in parallel using "multiprocessing" in Python 3.2:

def computation:    
    global output

    for x in range(i,j):
        localResult = ... #perform some computation as a function of i and j
        output.append(localResult)

In total, I want to perform this computation for a range of i=0 to j=100. Thus I want to create a number of processes that each call the function "computation" with a subdomain of the total range. Any ideas of how do to this? Is there a better way than using multiprocessing?

More specific, I want to perform a domain decomposition and I have the following code:

from multiprocessing import Pool

class testModule:

    def __init__(self):
        self

    def computation(self, args):
        start, end = args
        print('start: ', start, ' end: ', end)

testMod = testModule()
length = 100
np=4
p = Pool(processes=np)
p.map(yes tMod.computation, [(length, startPosition, length//np) for startPosition in    range(0, length, length//np)]) 

I get an error message mentioning PicklingError. Any ideas what could be the problem here?

like image 847
user1499144 Avatar asked Jul 24 '12 13:07

user1499144


People also ask

How do you run two for loops in Python in parallel?

How do you run two for loops in Python in parallel? Use the multiprocessing Module to Parallelize the for Loop in Python. Use the joblib Module to Parallelize the for Loop in Python. Use the asyncio Module to Parallelize the for Loop in Python.

How do you perform a parallel execution in Python?

One way to achieve parallelism in Python is by using the multiprocessing module. The multiprocessing module allows you to create multiple processes, each of them with its own Python interpreter. For this reason, Python multiprocessing accomplishes process-based parallelism.

Can Python run two functions in parallel?

Multiprocessing in Python enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel.


2 Answers

Joblib is designed specifically to wrap around multiprocessing for the purposes of simple parallel looping. I suggest using that instead of grappling with multiprocessing directly.

The simple case looks something like this:

from joblib import Parallel, delayed
Parallel(n_jobs=2)(delayed(foo)(i**2) for i in range(10))  # n_jobs = number of processes

The syntax is simple once you understand it. We are using generator syntax in which delayed is used to call function foo with its arguments contained in the parentheses that follow.

In your case, you should either rewrite your for loop with generator syntax, or define another function (i.e. 'worker' function) to perform the operations of a single loop iteration and place that into the generator syntax of a call to Parallel.

In the later case, you would do something like:

Parallel(n_jobs=2)(delayed(foo)(parameters) for x in range(i,j))

where foo is a function you define to handle the body of your for loop. Note that you do not want to append to a list, since Parallel is returning a list anyway.

like image 97
Louis Thibault Avatar answered Sep 21 '22 20:09

Louis Thibault


In this case, you probably want to define a simple function to perform the calculation and get localResult.

def getLocalResult(args):
    """ Do whatever you want in this func.  
        The point is that it takes x,i,j and 
        returns localResult
    """
    x,i,j = args  #unpack args
    return doSomething(x,i,j)

Now in your computation function, you just create a pool of workers and map the local results:

import multiprocessing
def computation(np=4):
    """ np is number of processes to fork """
    p = multiprocessing.Pool(np)
    output = p.map(getLocalResults, [(x,i,j) for x in range(i,j)] )
    return output

I've removed the global here because it's unnecessary (globals are usually unnecessary). In your calling routine you should just do output.extend(computation(np=4)) or something similar.

EDIT

Here's a "working" example of your code:

from multiprocessing import Pool

def computation(args):
    length, startPosition, npoints = args
    print(args)

length = 100
np=4
p = Pool(processes=np)
p.map(computation, [(startPosition,startPosition+length//np, length//np) for startPosition in  range(0, length, length//np)])

Note that what you had didn't work because you were using an instance method as your function. multiprocessing starts new processes and sends the information between processes via pickle, therefore, only objects which can be pickled can be used. Note that it really doesn't make sense to use an instance method anyway. Each process is a copy of the parent, so any changes to state which happen in the processes do not propagate back to the parent anyway.

like image 31
mgilson Avatar answered Sep 17 '22 20:09

mgilson