Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use initializer to set up my multiprocess pool?

I'm trying to use the multiprocess Pool object. I'd like each process to open a database connection when it starts, then use that connection to process the data that is passed in. (Rather than opening and closing the connection for each bit of data.) This seems like what the initializer is for, but I can't wrap my head around how the worker and the initializer communicate. So I have something like this:

def get_cursor():   return psycopg2.connect(...).cursor()  def process_data(data):    # here I'd like to have the cursor so that I can do things with the data  if __name__ == "__main__":   pool = Pool(initializer=get_cursor, initargs=())   pool.map(process_data, get_some_data_iterator()) 

how do I (or do I) get the cursor back from get_cursor() into the process_data()?

like image 536
Chris Curvey Avatar asked Apr 12 '12 03:04

Chris Curvey


People also ask

How do processes pools work in multiprocessing?

Pool allows multiple jobs per process, which may make it easier to parallel your program. If you have a numbers jobs to run in parallel, you can make a Pool with number of processes the same number of as CPU cores and after that pass the list of the numbers jobs to pool. map.

How does pool Apply_async work?

Pool. apply_async is also like Python's built-in apply , except that the call returns immediately instead of waiting for the result. An AsyncResult object is returned. You call its get() method to retrieve the result of the function call.

How do you use multiprocessing in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.

Which is the method used to change the default way to create child processes in multiprocessing?

Python provides the ability to create and manage new processes via the multiprocessing. Process class. In multiprocessing programming, we may need to change the technique used to start child processes. This is called the start method.


1 Answers

The initialize function is called thus:

def worker(...):     ...     if initializer is not None:         initializer(*args) 

so there is no return value saved anywhere. You might think this dooms you, but no! Each worker is in a separate process. Thus, you can use an ordinary global variable.

This is not exactly pretty, but it works:

cursor = None def set_global_cursor(...):     global cursor     cursor = ... 

Now you can just use cursor in your process_data function. The cursor variable inside each separate process is separate from all the other processes, so they do not step on each other.

(I have no idea whether psycopg2 has a different way to deal with this that does not involve using multiprocessing in the first place; this is meant as a general answer to a general problem with the multiprocessing module.)

like image 55
torek Avatar answered Sep 28 '22 02:09

torek