Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python3.x how to share a database connection between processes?

I'm running a number of processes using multiprocessing.Pool

Each process has to query my mysql database.

I currently connect to the database once and then share the connection between the processes

It works but occasionally I get strange errors. I've confirmed that the errors are caused when querying the database.

I figured the problem is because the same connection is used for all the processes.

  • Is this correct?

As I looked for an answer I stumbled upon this q&a How to share a single MySQL database connection between multiple processes in Python

So I looked up Class pooling.MySQLConnectionPool

  • http://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html
  • http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnectionpool.html
  • http://dev.mysql.com/doc/connector-python/en/connector-python-api-pooledmysqlconnection.html

If I understand this. I'll set up a pool with a number of connections and share the pool between processes. Each process will then look into that pool and if a connection is available use it or else wait until a connection is freed.

  • Is this correct?

But then I found this q&a Accessing a MySQL connection pool from Python multiprocessing

It seems first that "mata" confirms what I suspected but at the same time he dismisses the use of setting up a pool to be shared between processes

sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly),

Instead he suggests

so each process using it's own connections is actually what you should aim for.

What does that mean?

  • Should I create a single connection for each worker? Then what are mysql pools good for?

The example given by mata in his answer seems reasonable enough but I don't understand the passing of the entire pool as the init argument

p = Pool(initializer=init)
  • Why? (As ph_singer points out in the comments this is not a good solution)

Changing the blocking Pool.map() method to Pool.map_async() and sending a connection from the pool to the map_async(q, ConnObj) should suffice?

  • Is this correct?

In the comments it's mentioned that

The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue

UPDATE Found this. Seems to agree: https://stackoverflow.com/a/26072257/1267259

If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.

Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.

  • Is this really correct?

Isn't the purpose of a mysql pool to handle requests by the processes and relay them to a available connection?

Now I'm just confused...

like image 467
user1267259 Avatar asked Feb 20 '15 21:02

user1267259


People also ask

Which method is used to connect to the database in Python?

To create a connection between the MySQL database and Python, the connect() method of mysql. connector module is used. We pass the database details like HostName, username, and the password in the method call, and then the method returns the connection object.


1 Answers

Found Share connection to postgres db across processes in Python
The answer to my first question seems to be

You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space.

The answer to my remaining questions basically boils down to which of the following statements you go with (from the discussion in the comments in this q&a)

Basically, the idea is to create a connection pool in the main process, and then in each spawned thread/process, you request connections from that pool. Threads should not share the same identical connection, because then threads can block each other from one of the major activities that threading is supposed to help with: IO. – Mr. F

or

neither pass the pool or a connection from the pool to childprocesses

Each child process creates its own db connections if it needs them (either individually or as a pool) – J.F. Sebastian.

and

"why use [db connections] pool" -- if there are multiple threads in your worker process then the pool might be useful (several threads can read/write data in parallel (CPython can release GIL during I/O)). If there is only one thread per worker process then there is no point to use the db pool. – J.F. Sebastian


As a side note

This doesn't exactly answer my third question but it does actually present creating a connection per process as feasible in some cases (Share connection to postgres db across processes in Python)

It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 5:19

like image 163
user1267259 Avatar answered Oct 20 '22 13:10

user1267259