I'm trying to use the multiprocess Pool object. I'd like each process to open a database connection when it starts, then use that connection to process the data that is passed in. (Rather than opening and closing the connection for each bit of data.) This seems like what the initializer is for, but I can't wrap my head around how the worker and the initializer communicate. So I have something like this:
def get_cursor(): return psycopg2.connect(...).cursor() def process_data(data): # here I'd like to have the cursor so that I can do things with the data if __name__ == "__main__": pool = Pool(initializer=get_cursor, initargs=()) pool.map(process_data, get_some_data_iterator())
how do I (or do I) get the cursor back from get_cursor() into the process_data()?
Pool allows multiple jobs per process, which may make it easier to parallel your program. If you have a numbers jobs to run in parallel, you can make a Pool with number of processes the same number of as CPU cores and after that pass the list of the numbers jobs to pool. map.
Pool. apply_async is also like Python's built-in apply , except that the call returns immediately instead of waiting for the result. An AsyncResult object is returned. You call its get() method to retrieve the result of the function call.
In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.
Python provides the ability to create and manage new processes via the multiprocessing. Process class. In multiprocessing programming, we may need to change the technique used to start child processes. This is called the start method.
The initialize function is called thus:
def worker(...): ... if initializer is not None: initializer(*args)
so there is no return value saved anywhere. You might think this dooms you, but no! Each worker is in a separate process. Thus, you can use an ordinary global
variable.
This is not exactly pretty, but it works:
cursor = None def set_global_cursor(...): global cursor cursor = ...
Now you can just use cursor
in your process_data
function. The cursor
variable inside each separate process is separate from all the other processes, so they do not step on each other.
(I have no idea whether psycopg2
has a different way to deal with this that does not involve using multiprocessing
in the first place; this is meant as a general answer to a general problem with the multiprocessing
module.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With