Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrency-safe way to initialize global data connections in Flask

Global variables are not thread-safe or "process-safe" in Flask.

However, I need to open connections to services that each worker will use, such as a PubSub client or a Cloud Storage client. It seems like these still need to be global so that any function in the application can access them. To lazily initialize them, I check if the variable is None, and this needs to be thread-safe. What is the recommended approach for opening connections that each request will use? Should I use a thread lock to synchronize?

like image 613
Joshua Fox Avatar asked Jan 16 '19 10:01

Joshua Fox


1 Answers

The question you linked is talking about data, not connections. Having multiple workers mutating global data is not good because you can't reason about where those workers are in a web application to keep them in sync.

The solution to that question is to use an external data source, like a database, which must be connected to somehow. Your idea to have one global connection is not safe though, since multiple worker threads would interact with it concurrently and either mess with each other's state or wait one at a time to acquire the resource. The simplest way to handle this is to establish a connection in each view when you need it.


This example shows how to have a unique connection per request, without globals, reusing the connection once it's established for the request. The g object, while it looks like a global, is implemented as a thread-local behind the scenes, so each worker gets it's own g instance and connection stored on it during one request only.

from flask import g

def get_conn():
    """Use this function to establish or get the already established
    connection during a request. The connection is closed at the end
    of the request. This avoids having a global connection by storing
    the connection on the g object per request.
    """
    if "conn" not in g:
        g.conn = make_connection(...)

    return g.conn

@app.teardown_request
def close_conn(e):
    """Automatically close the connection after the request if
    it was opened.
    """
    conn = g.pop("conn", None)

    if conn is not None:
        conn.close()

@app.route("/get_data")
def get_data():
    # If something else has already used get_conn during the
    # request, this will return the same connection. Anything
    # that uses it after this will also use the same connection.
    conn = get_conn()
    data = conn.query(...)
    return jsonify(data)

You might eventually find that establishing a new connection each request is too expensive once you have many thousands of concurrent requests. One solution is to build a connection pool to store a list of connections globally, with a thread-safe way to acquire and replace a connection in the list as needed. SQLAlchemy (and Flask-SQLAlchemy) uses this technique. Many libraries already provide connection pool implementations, so either use them or use them as a reference for your own.

like image 104
davidism Avatar answered Oct 28 '22 04:10

davidism