I'm running a number of processes using multiprocessing.Pool Each process has to query my mysql database. I currently connect to the database once and then share the connection between the processes It works but occasionally I get strange errors. I've confirmed that the errors are caused when querying the database. I figured the problem is because the same connection is used for all the processes. <ul> <li>Is this correct?</li> </ul> As I looked for an answer I stumbled upon this q&a How to share a single MySQL database connection between multiple processes in Python So I looked up Class pooling.MySQLConnectionPool <ul> <li>http://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html</li> <li>http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnectionpool.html</li> <li>http://dev.mysql.com/doc/connector-python/en/connector-python-api-pooledmysqlconnection.html</li> </ul> If I understand this. I'll set up a pool with a number of connections and share the pool between processes. Each process will then look into that pool and if a connection is available use it or else wait until a connection is freed. <ul> <li>Is this correct?</li> </ul> But then I found this q&a Accessing a MySQL connection pool from Python multiprocessing It seems first that "mata" confirms what I suspected but at the same time he dismisses the use of setting up a pool to be shared between processes <blockquote> sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly), </blockquote> Instead he suggests <blockquote> so each process using it's own connections is actually what you should aim for. </blockquote> What does that mean? <ul> <li>Should I create a single connection for each worker? Then what are mysql pools good for?</li> </ul> The example given by mata in his answer seems reasonable enough but I don't understand the passing of the entire pool as the init argument <pre class="prettyprint"><code>p = Pool(initializer=init) </code></pre> <ul> <li> Why? (As ph_singer points out in the comments this is not a good solution)</li> </ul> Changing the blocking Pool.map() method to Pool.map_async() and sending a connection from the pool to the map_async(q, ConnObj) should suffice? <ul> <li>Is this correct?</li> </ul> In the comments it's mentioned that <blockquote> The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue </blockquote> UPDATE Found this. Seems to agree: https://stackoverflow.com/a/26072257/1267259 <blockquote> If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers. Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that. </blockquote> <ul> <li>Is this really correct?</li> </ul> Isn't the purpose of a mysql pool to handle requests by the processes and relay them to a available connection? Now I'm just confused...

Found Share connection to postgres db across processes in Python The answer to my first question seems to be <blockquote> You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space. </blockquote> The answer to my remaining questions basically boils down to which of the following statements you go with (from the discussion in the comments in this q&a) <blockquote> Basically, the idea is to create a connection pool in the main process, and then in each spawned thread/process, you request connections from that pool. Threads should not share the same identical connection, because then threads can block each other from one of the major activities that threading is supposed to help with: IO. – Mr. F </blockquote> or neither pass the pool or a connection from the pool to childprocesses <blockquote> Each child process creates its own db connections if it needs them (either individually or as a pool) – J.F. Sebastian. </blockquote> and <blockquote> "why use [db connections] pool" -- if there are multiple threads in your worker process then the pool might be useful (several threads can read/write data in parallel (CPython can release GIL during I/O)). If there is only one thread per worker process then there is no point to use the db pool. – J.F. Sebastian </blockquote> <hr> As a side note This doesn't exactly answer my third question but it does actually present creating a connection per process as feasible in some cases (Share connection to postgres db across processes in Python) <blockquote> It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 5:19 </blockquote>

Python3.x how to share a database connection between processes?

Tags:

python

python-3.x

mysql

multiprocessing

I'm running a number of processes using multiprocessing.Pool

Each process has to query my mysql database.

I currently connect to the database once and then share the connection between the processes

It works but occasionally I get strange errors. I've confirmed that the errors are caused when querying the database.

I figured the problem is because the same connection is used for all the processes.

Is this correct?

As I looked for an answer I stumbled upon this q&a How to share a single MySQL database connection between multiple processes in Python

So I looked up Class pooling.MySQLConnectionPool

http://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html
http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnectionpool.html
http://dev.mysql.com/doc/connector-python/en/connector-python-api-pooledmysqlconnection.html

If I understand this. I'll set up a pool with a number of connections and share the pool between processes. Each process will then look into that pool and if a connection is available use it or else wait until a connection is freed.

Is this correct?

But then I found this q&a Accessing a MySQL connection pool from Python multiprocessing

It seems first that "mata" confirms what I suspected but at the same time he dismisses the use of setting up a pool to be shared between processes

sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly),

Instead he suggests

so each process using it's own connections is actually what you should aim for.

What does that mean?

Should I create a single connection for each worker? Then what are mysql pools good for?

The example given by mata in his answer seems reasonable enough but I don't understand the passing of the entire pool as the init argument

p = Pool(initializer=init)

Why? (As ph_singer points out in the comments this is not a good solution)

Changing the blocking Pool.map() method to Pool.map_async() and sending a connection from the pool to the map_async(q, ConnObj) should suffice?

Is this correct?

In the comments it's mentioned that

The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue

UPDATE Found this. Seems to agree: https://stackoverflow.com/a/26072257/1267259

If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.

Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.

Is this really correct?

Isn't the purpose of a mysql pool to handle requests by the processes and relay them to a available connection?

Now I'm just confused...

467

asked Feb 20 '15 21:02

user1267259

1 Answers

Found Share connection to postgres db across processes in Python
The answer to my first question seems to be

You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space.

The answer to my remaining questions basically boils down to which of the following statements you go with (from the discussion in the comments in this q&a)

Basically, the idea is to create a connection pool in the main process, and then in each spawned thread/process, you request connections from that pool. Threads should not share the same identical connection, because then threads can block each other from one of the major activities that threading is supposed to help with: IO. – Mr. F

neither pass the pool or a connection from the pool to childprocesses

Each child process creates its own db connections if it needs them (either individually or as a pool) – J.F. Sebastian.

and

"why use [db connections] pool" -- if there are multiple threads in your worker process then the pool might be useful (several threads can read/write data in parallel (CPython can release GIL during I/O)). If there is only one thread per worker process then there is no point to use the db pool. – J.F. Sebastian

As a side note

This doesn't exactly answer my third question but it does actually present creating a connection per process as feasible in some cases (Share connection to postgres db across processes in Python)

It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 5:19

163

answered Oct 20 '22 13:10

user1267259

Related questions
                            
                                Shared memory between python processes
                            
                                concat pandas DataFrame along timeseries indexes
                            
                                How do I test a module that depends on boto and an Amazon AWS service?
                            
                                How do I compute the variance of a column of a sparse matrix in Scipy?
                            
                                file name vs file object as a function argument
                            
                                python pip: no distributions at all found for an existing package
                            
                                Python error when importing image_to_string from tesseract
                            
                                Matplotlib: make final figure dimensions match figsize with savefig() and bbox_extra_artists
                            
                                Faster alternative to Python's zipfile module?
                            
                                Permission denied doing os.mkdir(d) after running shutil.rmtree(d) in Python
                            
                                WTForms RadioField default values
                            
                                Does something like CanCan (authorization library) exist for flask and python
                            
                                tab complete dictionary keys in ipython
                            
                                Why is copying a list using a slice[:] faster than using the obvious way?
                            
                                Django: ValueError: Lookup failed for model referenced by field account.UserProfile.user: auth.User
                            
                                gdb pretty printing with python a recursive structure
                            
                                How to prevent Exception ignored in: <module 'threading' from ... > while setting signal handler?
                            
                                How to detect if python script is being run as a background process
                            
                                A python function that accepts as an argument either a scalar or a numpy array
                            
                                python lockf and flock behaviour

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python3.x how to share a database connection between processes?

Tags:

python

python-3.x

mysql

multiprocessing

user1267259

People also ask

1 Answers

user1267259

Recent Activity

Donate For Us