Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-tenancy with SQLAlchemy

I've got a web-application which is built with Pyramid/SQLAlchemy/Postgresql and allows users to manage some data, and that data is almost completely independent for different users. Say, Alice visits alice.domain.com and is able to upload pictures and documents, and Bob visits bob.domain.com and is also able to upload pictures and documents. Alice never sees anything created by Bob and vice versa (this is a simplified example, there may be a lot of data in multiple tables really, but the idea is the same).

Now, the most straightforward option to organize the data in the DB backend is to use a single database, where each table (pictures and documents) has user_id field, so, basically, to get all Alice's pictures, I can do something like

user_id = _figure_out_user_id_from_domain_name(request)
pictures = session.query(Picture).filter(Picture.user_id==user_id).all()

This is all easy and simple, however there are some disadvantages

  • I need to remember to always use additional filter condition when making queries, otherwise Alice may see Bob's pictures;
  • If there are many users the tables may grow huge
  • It may be tricky to split the web application between multiple machines

So I'm thinking it would be really nice to somehow split the data per-user. I can think of two approaches:

  1. Have separate tables for Alice's and Bob's pictures and documents within the same database (Postgres' Schemas seems to be a correct approach to use in this case):

    documents_alice
    documents_bob
    pictures_alice
    pictures_bob
    

    and then, using some dark magic, "route" all queries to one or to the other table according to the current request's domain:

    _use_dark_magic_to_configure_sqlalchemy('alice.domain.com')
    pictures = session.query(Picture).all()  # selects all Alice's pictures from "pictures_alice" table
    ...
    _use_dark_magic_to_configure_sqlalchemy('bob.domain.com')
    pictures = session.query(Picture).all()  # selects all Bob's pictures from "pictures_bob" table
    
  2. Use a separate database for each user:

    - database_alice
       - pictures
       - documents
    - database_bob
       - pictures
       - documents 
    

    which seems like the cleanest solution, but I'm not sure if multiple database connections would require much more RAM and other resources, limiting the number of possible "tenants".

So, the question is, does it all make sense? If yes, how do I configure SQLAlchemy to either modify the table names dynamically on each HTTP request (for option 1) or to maintain a pool of connections to different databases and use the correct connection for each request (for option 2)?

like image 632
Sergey Avatar asked Nov 14 '12 02:11

Sergey


People also ask

What is SQLAlchemy tenant schema?

Your SQLAlchemy definition states that all tenant specific tables to reside under the tenant schema, which is just a placeholder for a real tenant. How do we fix this? For Alembic to work, it needs to be able to compare the database models against some tables.

How do I manage transactions in SQLAlchemy Orm?

When using the SQLAlchemy ORM, the public API for transaction control is via the Session object, which makes usage of the Transaction object internally. See Managing Transactions for further information. The Connection object provides a Connection.begin () method which returns a Transaction object.

What is SQLAlchemy CONNECT method?

method sqlalchemy.engine.Engine.connect(close_with_result=False) ¶ Return a new Connection object. The Connection object is a facade that uses a DBAPI connection internally in order to communicate with the database. This connection is procured from the connection-holding Pool referenced by this Engine.

What's new in SQLAlchemy?

New in version 1.4: SQLAlchemy now has a transparent query caching system that substantially lowers the Python computational overhead involved in converting SQL statement constructs into SQL strings across both Core and ORM. See the introduction at Transparent SQL Compilation Caching added to All DQL, DML Statements in Core, ORM.


1 Answers

After pondering on jd's answer I was able to achieve the same result for postgresql 9.2, sqlalchemy 0.8, and flask 0.9 framework:

from sqlalchemy import event
from sqlalchemy.pool import Pool
@event.listens_for(Pool, 'checkout')
def on_pool_checkout(dbapi_conn, connection_rec, connection_proxy):
    tenant_id = session.get('tenant_id')
    cursor = dbapi_conn.cursor()
    if tenant_id is None:
        cursor.execute("SET search_path TO public, shared;")
    else:
        cursor.execute("SET search_path TO t" + str(tenant_id) + ", shared;")
    dbapi_conn.commit()
    cursor.close()
like image 66
synergetic Avatar answered Sep 20 '22 16:09

synergetic