Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing module and distinct psycopg2 connections

I am very puzzled as to the behavior of some multiprocessing code that is using psycopg2 to make queries in parallel to a postgres db.

Essentially, I am making the same query (with different params) to various partitions of a larger table. I am using multiprocessing.Pool to fork off a separate query.

My multiprocessing call looks like this:

pool = Pool(processes=num_procs)
results=pool.map(run_sql, params_list)

My run_sql code looks like this:

def run_sql(zip2):
    conn = get_connection()
    curs = conn.cursor()
    print "conn: %s curs:%s pid=%s" % (id(conn), id(curs), os.getpid())
    ...
    curs.execute(qry)
    records = curs.fetchall()

def get_connection()
    ...
    conn = psycopg2.connect(user=db_user, host=db_host, 
                         dbname=db_name, password=db_pwd)

    return conn

So my expectation is that each process would get a separate db connection via the call to get_connection() and that print id(conn) would display a distinct value. However, that doesn't seem to be the case and I am at a loss to explain it. Even print id(curs) is the same. Only print os.getpid() shows a difference. Does it somehow use the same connection for each forked process ?

conn: 4614554592 curs:4605160432 pid=46802
conn: 4614554592 curs:4605160432 pid=46808
conn: 4614554592 curs:4605160432 pid=46810
conn: 4614554592 curs:4605160432 pid=46784
conn: 4614554592 curs:4605160432 pid=46811
like image 692
femibyte Avatar asked Jul 15 '16 20:07

femibyte


1 Answers

I think I've figured this out. The answer lies in the fact that multiprocessing in Python is shared-nothing so the entire memory space is copied, functions and all. Hence for each process, even though the pid is different, the memory spaces are copies of each other and the address of the connection within the memory space ends up being the same. The same reason is why declaring a global connection pool as I did initially was useless, each process ended up with its own connection pool with just 1 connection active at a time.

like image 127
femibyte Avatar answered Nov 02 '22 18:11

femibyte