SQLAlchemy automap: Best practices for performance

Tags:

sqlalchemy

I'm building a python app around an existing (mysql) database and am using automap to infer tables and relationships:

    base = automap_base()

    self.engine = create_engine(
        'mysql://%s:%s@%s/%s?charset=utf8mb4' % (
            config.DB_USER, config.DB_PASSWD, config.DB_HOST, config.DB_NAME
        ), echo=False
    )

    # reflect the tables
    base.prepare(self.engine, reflect=True)

    self.TableName = base.classes.table_name

Using this I can do things like session.query(TableName) etc... However, I'm worried about performance, because every time the app runs it will do the whole inference again.

Is this a legitimate concern?
If so, is there a possibility to 'cache' the output of Automap?

657

asked Jan 09 '17 12:01

3 Answers

Performance can be a legitimate concern. If the database schema is not changing, it can be time consuming to reflect the database every time a script is run. This is more of an issue during development, not starting up a long running application. It's also a significant time saver if your database is on a remote server (again, particularly during development).

I use code that is similar to the answer here (as noted by @ACV). The general plan is to perform the reflection the first time, then pickle the metadata object. The next time the script is run, it will look for the pickle file and use that. The file can be anywhere, but I place mine in ~/.sqlalchemy_cache. This is an example based on your code.

import os
from sqlalchemy.ext.declarative import declarative_base

self.engine = create_engine(
    'mysql://%s:%s@%s/%s?charset=utf8mb4' % (
        config.DB_USER, config.DB_PASSWD, config.DB_HOST, config.DB_NAME
    ), echo=False
)

metadata_pickle_filename = "mydb_metadata"
cache_path = os.path.join(os.path.expanduser("~"), ".sqlalchemy_cache")
cached_metadata = None
if os.path.exists(cache_path):
try:
    with open(os.path.join(cache_path, metadata_pickle_filename), 'rb') as cache_file:
        cached_metadata = pickle.load(file=cache_file)
except IOError:
    # cache file not found - no problem, reflect as usual
    pass

if cached_metadata:
    base = declarative_base(bind=self.engine, metadata=cached_metadata)
else:
    base = automap_base()
    base.prepare(self.engine, reflect=True) # reflect the tables

    # save the metadata for future runs
    try:
        if not os.path.exists(cache_path):
            os.makedirs(cache_path)
        # make sure to open in binary mode - we're writing bytes, not str
        with open(os.path.join(cache_path, metadata_pickle_filename), 'wb') as cache_file:
            pickle.dump(Base.metadata, cache_file)
    except:
        # couldn't write the file for some reason
        pass

self.TableName = base.classes.table_name

For anyone using declarative table class definitions, assuming a Base object defined as e.g.

Base = declarative_base(bind=engine)

metadata_pickle_filename = "ModelClasses_trilliandb_trillian.pickle"

# ------------------------------------------
# Load the cached metadata if it's available
# ------------------------------------------
# NOTE: delete the cached file if the database schema changes!!
cache_path = os.path.join(os.path.expanduser("~"), ".sqlalchemy_cache")
cached_metadata = None
if os.path.exists(cache_path):
    try:
        with open(os.path.join(cache_path, metadata_pickle_filename), 'rb') as cache_file:
            cached_metadata = pickle.load(file=cache_file)
    except IOError:
        # cache file not found - no problem
        pass
# ------------------------------------------

# define all tables
#
class MyTable(Base):
    if cached_metadata:
        __table__ = cached_metadata.tables['my_schema.my_table']
    else:
        __tablename__ = 'my_table'
        __table_args__ = {'autoload':True, 'schema':'my_schema'}

...
# ----------------------------------------
# If no cached metadata was found, save it
# ----------------------------------------
if cached_metadata is None:
    # cache the metadata for future loading
    # - MUST DELETE IF THE DATABASE SCHEMA HAS CHANGED
    try:
        if not os.path.exists(cache_path):
            os.makedirs(cache_path)
        # make sure to open in binary mode - we're writing bytes, not str
        with open(os.path.join(cache_path, metadata_pickle_filename), 'wb') as cache_file:
            pickle.dump(Base.metadata, cache_file)
    except:
        # couldn't write the file for some reason
        pass

Important Note!! If the database schema changes, you must delete the cached file to force the code to autoload and create a new cache. If you don't, the changes will be be reflected in the code. It's an easy thing to forget.

197

answered Sep 20 '22 08:09

Demitri

I think that "reflecting" the structure of your database is not the way to go. Unless your app tries to "infer" things from the structure, like static code analysis would for source files, then it is unnecessary. The other reason for reflecting it at run-time would be the reduced time to begin "using" the database using SQLAlchemy. However:

Another option would be to use something like SQLACodegen (https://pypi.python.org/pypi/sqlacodegen):

It will "reflect" your database once and create a 99.5% accurate set of declarative SQLAlchemy models for you to work with. However, this does require that you keep the model subsequently in-sync with the structure of the database. I would assume that this is not a big concern seeing as the tables you're already-working with are stable-enough such that run-time reflection of their structure does not impact your program much.

Generating the declarative models is essentially a "cache" of the reflection. It's just that SQLACodegen saved it into a very readable set of classes + fields instead of data in-memory. Even with a changing structure, and my own "changes" to the generated declarative models, I still use SQLACodegen later-on in a project whenever I make database changes. It means that my models are generally consistent amongst one-another and that I don't have things such as typos and data-mismatches due to copy-pasting.

answered Sep 21 '22 08:09

Zoran Pavlovic

The answer to your first question is largely subjective. You are adding database queries to fetch the reflection metadata to the application load time. Whether or not that overhead is significant depends on your project requirements.

For reference, I have an internal tool at work that uses a reflection pattern because the the load-time is acceptable for our team. That might not be the case if it were an externally-facing product. My hunch is that for most applications the reflection overhead will not dominate the total application load time.

If you decide it is significant for your purposes, this question has an interesting answer where the user pickles the database metadata in order to locally cache it.

answered Sep 22 '22 08:09

ACV

Related questions
                            
                                How to perform date_trunc query in Postgres using SQLAlchemy
                            
                                In sqlalchemy, how can I use polymorphic joined table inheritance when the child table has multiple foreign keys to the parent table?
                            
                                In SQLAlchemy, can I create an Engine from an existing ODBC connection?
                            
                                Creating databases in SQLAlchemy tests with PostgreSQL
                            
                                flask admin custom QueryAjaxModelLoader
                            
                                Polymorphic Model Inheritance in Django
                            
                                How to properly close mysql connections in sqlalchemy?
                            
                                sqlalchemy id equality vs reference equality
                            
                                Is it OK to execute code when a module imports?
                            
                                Using PostgreSQL array to store many-to-many relationship
                            
                                SQLAlchemy query, join on relationship and order by count
                            
                                Why sqlalchemy add \ to " for a perfect JSON string to postgresql json field?
                            
                                Can I have SQLAlchemy do subquery eager loading without repeating the full original query?
                            
                                SQLAlchemy: Getting a single object from joining multiple tables
                            
                                Calling MSSQL stored procedure from SqlAlchemy
                            
                                AttributeError: __enter__ using with statement SqlAlchemy session
                            
                                Update an sqlite database schema with sqlalchemy and elixir
                            
                                How to reduce number of connections using SQLAlchemy + postgreSQL?
                            
                                How to log SQL statements and rows returned in sqlalchemy to aid in debugging?
                            
                                how to change datetime to string in sqlalchemy query? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With