Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Share DB connection in a process pool

I have a Python 3 program that updates a large list of rows based on their ids (in a table in a Postgres 9.5 database).

I use multiprocessing to speed up the process. As Psycopg's connections can’t be shared across processes, I create a connection for each row, then close it.

Overall, multiprocessing is faster than single processing (5 times faster with 8 CPUs). However, creating a connection is slow: I'd like to create just a few connections and keep them open as long as required.

Since .map() chops ids_list into a number of chunks which it submits to the process pool, would it be possible to share a database connection for all ids in the same chunk/process?

Sample code:

from multiprocessing import Pool
import psycopg2


def create_db_connection():
    conn = psycopg2.connect(database=database,
                            user=user,
                            password=password,
                            host=host)
    return conn


def my_function(item_id):
    conn = create_db_connection()

    # Other CPU-intensive operations are done here

    cur = conn.cursor()
    cur.execute("""
        UPDATE table
        SET
        my_column = 1
        WHERE id = %s;
        """,
        (item_id, ))
    cur.close()
    conn.commit()


if __name__ == '__main__':
    ids_list = []  # Long list of ids

    pool = Pool()  # os.cpu_count() processes
    pool.map(my_function, ids_list)

Thanks for any help you can provide.

like image 641
Antoine Dusséaux Avatar asked Oct 15 '16 21:10

Antoine Dusséaux


People also ask

What is a pooled database connection?

What is database connection pooling? Database connection pooling is a way to reduce the cost of opening and closing connections by maintaining a “pool” of open connections that can be passed from database operation to database operation as needed.

Which pooling connection do you use for connection to database?

When not processing a transaction, the connection sits idle. Connection pooling enables the idle connection to be used by some other thread to do useful work. In practice, when a thread needs to do work against a MySQL or other database with JDBC, it requests a connection from the pool.

What is the ideal DB connection pool size?

Default pool size is often 5 up to 10. First make sure you have a problem with the database. Try extremes like 2 or 30 under artificial load and see how it behaves.


1 Answers

You can use the initializer parameter of the Pool constructor. Setup the DB connection in the initializer function. Maybe pass the connection credentials as parameters.

Have a look at the docs: https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool

like image 58
Sidias-Korrado Avatar answered Sep 20 '22 17:09

Sidias-Korrado