I am having some difficulty in passing a database connection object or the cursor object using pool.map in the Python multiprocesing package. Basically, I want to create a pool of workers each with its own state and a db connection, so that they can execute queries in parallel.
I have tried these approaches, but I am getting a picklingerror in python with them -
Pool Map with 2 arugements
Use Initializer to set up multiprocess pool
The second link is exactly what I need to do, meaning I'd like each process to open a database connection when it starts, then use that connection to process the data/args that are passed in.
Here is my code.
import multiprocessing as mp
def process_data((id,db)):
print 'in processdata'
cursor = db.cursor()
query = ....
#cursor.execute(query)
#....
.....
.....
return row
`if __name__ == '__main__':
db = getConnection()
cursor = db.cursor()
print 'Initialised db connection and cursor'
inputs = [1,2,3,4,5]
pool = mp.Pool(processes=2)
result_list = pool.map(process_data,zip(inputs,repeat(db)))
#print result_list
pool.close()
pool.join()
`
This results in the following error -
`Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed`
I guess the db or the cursor object is not picklable according to python, because if I replace repeat(db), to repeat(x) where x is an int or string , it works. I have tried using the initializer function and it seems to work, initially but weird things happen when I execute queries, many return nothing for an id, when there is data present.
What would be the best way to achieve this? I am using python 2.6.6 on a linux machine.
I'm going to go ahead and put my comment up as an answer, because I think it's appropriate as one. You don't want to try to pass database connections from your parent process to your children processes. You want to move static data or other objects that can be serialized to your children processes. You can pass rows of data, etc. Or you want to have your children establish their own database connections when they become necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With