Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is connection pool in sqlalchemy thread-safe?

Documentation says that connection pool also is not designed for multithreading:

It’s critical that when using a connection pool, and by extension when using an Engine created via create_engine(), that the pooled connections are not shared to a forked process. TCP connections are represented as file descriptors, which usually work across process boundaries, meaning this will cause concurrent access to the file descriptor on behalf of two or more entirely independent Python interpreter states.

As i understand this, if i create connection pool:

self.engine = create_engine('postgresql://{user}:{password}@{host}:{port}/{db}'.format(
    user=Configuration().get(section='repository', option='user'),
    password=Configuration().get(section='repository', option='password'),
    host=Configuration().get(section='repository', option='host'),
    port=Configuration().get(section='repository', option='port'),
    db=Configuration().get(section='repository', option='database')
), echo=False, pool_size=3)

self.session = sessionmaker(self.engine, expire_on_commit=False)

and then call self.session() in different threads i will have 3 different connections which are used in N different threads. Does it mean that only 3 concurrent thread will do some work while others will wait until one or more thread will call session.close()? Or there is a chance that >2 threads will use the same connection simultaneously?

Is NullPool safer (because each new session is a new connection) or no?

self.engine = create_engine('postgresql://{user}:{password}@{host}:{port}/{db}'.format(
            user=Configuration().get(section='repository', option='user'),
            password=Configuration().get(section='repository', option='password'),
            host=Configuration().get(section='repository', option='host'),
            port=Configuration().get(section='repository', option='port'),
            db=Configuration().get(section='repository', option='database')
        ), echo=False, poolclass=NullPool)

The general question: is it ok to use the same connection pool in such case:

engine = create_engine('connection_string', echo=False, pool_size=3)
Session = sessionmaker(engine)

def some_function():
    session = Session()
    ...

pool = Pool(processes=10)
pool.map(some_function)
pool.close()
pool.join()
like image 734
Nikita Ryanov Avatar asked Aug 09 '18 14:08

Nikita Ryanov


People also ask

Is connection pool thread safe?

Therefore, a connection pool allows multiple threads to access the database concurrently using different connection objects instead of sharing the same one. Further, in this way, we don't have to care about whether the implementation of the Connection interface is thread-safe.

Does SQLAlchemy use a connection pool?

SQLAlchemy includes several connection pool implementations which integrate with the Engine . They can also be used directly for applications that want to add pooling to an otherwise plain DBAPI approach.

When should you not use connection pooling?

You reuse a prior database connection, in a new context to avoid the cost of setting up a new database connection for each request. The primary reason to avoid using database connections is that you're application's approach to solving problems isn't structured to accommodate a database connection pool.

Does SQLAlchemy close connection automatically?

connect() method returns a Connection object, and by using it in a Python context manager (e.g. the with: statement) the Connection. close() method is automatically invoked at the end of the block.


2 Answers

All in all there seems to be a mix between threads and processes. The question begins by asking if an SQLAlchemy connection pool is thread-safe, but ends with a code example that uses multiprocessing. The short answer to the "general question" is: no, you should not share an engine and its associated connection pool over process boundaries, if forking is used. There are exceptions, though.

The pool implementations are thread-safe themselves and by proxy an Engine is thread-safe as well, because an engine does not hold state in addition to keeping a reference to the pool. On the other hand the connections checked out from a pool are not thread-safe, and neither is a Session.

Documentation says that connection pool also is not designed for multithreading:

There's a bit of a misreading, since the original quote from the documentation is about sharing connection pools over process boundaries, if forking is used. This will likely lead to trouble, because beneath the SQLAlchemy and DB-API layers there is usually a TCP/IP socket or a file handle, and those should not be operated on concurrently.

In this particular case using a NullPool would be safe, while others are not, since it does not pool at all and so connections won't be shared between processes, unless one goes out of their way to do so.

Does it mean that only 3 concurrent thread will do some work while others will wait until one or more thread will call session.close()?

Assuming a QueuePool is in use, the set size is not a hard limit and there is some room for overflow. The size determines the number of connections to keep persistently in the pool. If the overflow limit is reached, the call will wait for timeout seconds before giving up and raising a TimeoutError, if no connection became available.

Or there is a chance that >2 threads will use the same connection simultaneously?

Two or more threads will not be able to accidentally checkout the same connection from a pool, except a StaticPool, but one could explicitly share it between threads after (don't).


In the end, "Working with Engines and Connections - Basic Usage" covers the main parts of the question:

A single Engine manages many individual DBAPI connections on behalf of the process and is intended to be called upon in a concurrent fashion [emphasis added].

...

For a multiple-process application that uses the os.fork system call, or for example the Python multiprocessing module, it’s usually required that a separate Engine be used for each child process. This is because the Engine maintains a reference to a connection pool that ultimately references DBAPI connections - these tend to not be portable across process boundaries. An Engine that is configured not to use pooling (which is achieved via the usage of NullPool) does not have this requirement.

like image 68
Ilja Everilä Avatar answered Sep 23 '22 02:09

Ilja Everilä


In case this helps anyone else -- and it's really the answer to a different question, which would be:

Does SQLAlchemy use the same connection pool for all engines in the same thread?

The answer is no. As @ilja-everila points out, SQLA expects you to use a single engine per process. So if you do

engine1 = create_engine(...)
engine2 = create_engine(...)
engine1.pool is engine2.pool   # <- False

# so although pool_size=5, you can open more than 5 total connections
# because each engine has separate pools
connections1 = [engine1.connect() for _ in range(5)]
connections2 = [engine1.connect() for _ in range(5)]

So if you came here wondering why you're maxing out your max_connections, and your code is using lots of separate engine instances, even if it's in the same thread, you can't expect them to share a connection pool.

Connection pools might be thread-safe, but they're unique to each engine instance.

So you should aim to have one global/singleton engine instance for your app.

Learn from my fail!

like image 26
hwjp Avatar answered Sep 22 '22 02:09

hwjp