I'm running a number of processes using multiprocessing.Pool
Each process has to query my mysql database.
I currently connect to the database once and then share the connection between the processes
It works but occasionally I get strange errors. I've confirmed that the errors are caused when querying the database.
I figured the problem is because the same connection is used for all the processes.
As I looked for an answer I stumbled upon this q&a How to share a single MySQL database connection between multiple processes in Python
So I looked up Class pooling.MySQLConnectionPool
If I understand this. I'll set up a pool with a number of connections and share the pool between processes. Each process will then look into that pool and if a connection is available use it or else wait until a connection is freed.
But then I found this q&a Accessing a MySQL connection pool from Python multiprocessing
It seems first that "mata" confirms what I suspected but at the same time he dismisses the use of setting up a pool to be shared between processes
sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly),
Instead he suggests
so each process using it's own connections is actually what you should aim for.
What does that mean?
The example given by mata in his answer seems reasonable enough but I don't understand the passing of the entire pool as the init argument
p = Pool(initializer=init)
Changing the blocking Pool.map() method to Pool.map_async() and sending a connection from the pool to the map_async(q, ConnObj) should suffice?
In the comments it's mentioned that
The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue
UPDATE Found this. Seems to agree: https://stackoverflow.com/a/26072257/1267259
If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.
Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.
Isn't the purpose of a mysql pool to handle requests by the processes and relay them to a available connection?
Now I'm just confused...
To create a connection between the MySQL database and Python, the connect() method of mysql. connector module is used. We pass the database details like HostName, username, and the password in the method call, and then the method returns the connection object.
Found Share connection to postgres db across processes in Python
The answer to my first question seems to be
You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space.
The answer to my remaining questions basically boils down to which of the following statements you go with (from the discussion in the comments in this q&a)
Basically, the idea is to create a connection pool in the main process, and then in each spawned thread/process, you request connections from that pool. Threads should not share the same identical connection, because then threads can block each other from one of the major activities that threading is supposed to help with: IO. – Mr. F
or
neither pass the pool or a connection from the pool to childprocesses
Each child process creates its own db connections if it needs them (either individually or as a pool) – J.F. Sebastian.
and
"why use [db connections] pool" -- if there are multiple threads in your worker process then the pool might be useful (several threads can read/write data in parallel (CPython can release GIL during I/O)). If there is only one thread per worker process then there is no point to use the db pool. – J.F. Sebastian
As a side note
This doesn't exactly answer my third question but it does actually present creating a connection per process as feasible in some cases (Share connection to postgres db across processes in Python)
It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 5:19
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With