Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python multiprocessing - Best way to initialize/pass database connection to be used across processes

I am having some difficulty in passing a database connection object or the cursor object using pool.map in the Python multiprocesing package. Basically, I want to create a pool of workers each with its own state and a db connection, so that they can execute queries in parallel.

I have tried these approaches, but I am getting a picklingerror in python with them -

Pool Map with 2 arugements

Use Initializer to set up multiprocess pool

The second link is exactly what I need to do, meaning I'd like each process to open a database connection when it starts, then use that connection to process the data/args that are passed in.

Here is my code.

import multiprocessing as mp

def process_data((id,db)):
  print 'in processdata'
  cursor = db.cursor()
  query = ....
  #cursor.execute(query)
  #....
  .....
  .....
  return row

`if __name__ == '__main__':

  db = getConnection() 
  cursor = db.cursor() 
  print 'Initialised db connection and cursor'
  inputs = [1,2,3,4,5]
  pool = mp.Pool(processes=2)
  result_list = pool.map(process_data,zip(inputs,repeat(db)))
  #print result_list
  pool.close()
  pool.join() 

`

This results in the following error -

`Exception in thread Thread-1:
  Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
    put(task)
  PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed`

I guess the db or the cursor object is not picklable according to python, because if I replace repeat(db), to repeat(x) where x is an int or string , it works. I have tried using the initializer function and it seems to work, initially but weird things happen when I execute queries, many return nothing for an id, when there is data present.

What would be the best way to achieve this? I am using python 2.6.6 on a linux machine.

like image 543
Angela Avatar asked Sep 27 '12 04:09

Angela


1 Answers

I'm going to go ahead and put my comment up as an answer, because I think it's appropriate as one. You don't want to try to pass database connections from your parent process to your children processes. You want to move static data or other objects that can be serialized to your children processes. You can pass rows of data, etc. Or you want to have your children establish their own database connections when they become necessary.

like image 171
g.d.d.c Avatar answered Oct 26 '22 23:10

g.d.d.c