Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically setting __tablename__ for sharding in SQLAlchemy?

In order to handle a growing database table, we are sharding on table name. So we could have database tables that are named like this:

table_md5one
table_md5two
table_md5three

All tables have the exact same schema.

How do we use SQLAlchemy and dynamically specify the tablename for the class that corresponds to this? Looks like the declarative_base() classes need to have tablename pre-specified.

There will eventually be too many tables to manually specify derived classes from a parent/base class. We want to be able to build a class that can have the tablename set up dynamically (maybe passed as a parameter to a function.)

like image 786
Suman Avatar asked Oct 03 '13 16:10

Suman


People also ask

What is Primary_key in SQLAlchemy?

All tables in a relational database should have primary keys. Even a many-to-many association table - the primary key would be the composite of the two association columns: CREATE TABLE my_association ( user_id INTEGER REFERENCES user(id), account_id INTEGER REFERENCES account(id), PRIMARY KEY (user_id, account_id) )

What is _sa_instance_state in SQLAlchemy?

_sa_instance_state is a non-database-persisted value used by SQLAlchemy internally (it refers to the InstanceState for the instance. While not directly relevant to this section, if we want to get at it, we should use the inspect() function to access it).

What is Automap in SQLAlchemy?

Define an extension to the sqlalchemy. ext. declarative system which automatically generates mapped classes and relationships from a database schema, typically though not necessarily one which is reflected.

What does SQLAlchemy all () return?

As the documentation says, all() returns the result of the query as a list.


3 Answers

OK, we went with the custom SQLAlchemy declaration rather than the declarative one.

So we create a dynamic table object like this:

from sqlalchemy import MetaData, Table, Column

def get_table_object(self, md5hash):
    metadata = MetaData()
    table_name = 'table_' + md5hash
    table_object = Table(table_name, metadata,
        Column('Column1', DATE, nullable=False),
        Column('Column2', DATE, nullable=False)
    )
    clear_mappers()
    mapper(ActualTableObject, table_object)
    return ActualTableObject

Where ActualTableObject is the class mapping to the table.

like image 82
Suman Avatar answered Oct 23 '22 23:10

Suman


In Augmenting the Base you find a way of using a custom Base class that can, for example, calculate the __tablename__ attribure dynamically:

class Base(object):
    @declared_attr
    def __tablename__(cls):
        return cls.__name__.lower()

The only problem here is that I don't know where your hash comes from, but this should give a good starting point.

If you require this algorithm not for all your tables but only for one you could just use the declared_attr on the table you are interested in sharding.

like image 15
javex Avatar answered Oct 23 '22 22:10

javex


Because I insist to use declarative classes with their __tablename__ dynamically specified by given parameter, after days of failing with other solutions and hours of studying SQLAlchemy internals, I come up with the following solution that I believe is simple, elegant and race-condition free.

def get_model(suffix):
    DynamicBase = declarative_base(class_registry=dict())

    class MyModel(DynamicBase):
        __tablename__ = 'table_{suffix}'.format(suffix=suffix)

        id = Column(Integer, primary_key=True)
        name = Column(String)
        ...

    return MyModel

Since they have their own class_registry, you will not get that warning saying:

This declarative base already contains a class with the same class name and module name as mypackage.models.MyModel, and will be replaced in the string-lookup table.

Hence, you will not be able to reference them from other models with string lookup. However, it works perfectly fine to use these on-the-fly declared models for foreign keys as well:

ParentModel1 = get_model(123)
ParentModel2 = get_model(456)

class MyChildModel(BaseModel):
    __tablename__ = 'table_child'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    parent_1_id = Column(Integer, ForeignKey(ParentModel1.id))
    parent_2_id = Column(Integer, ForeignKey(ParentModel2.id))
    parent_1 = relationship(ParentModel1)
    parent_2 = relationship(ParentModel2)

If you only use them to query/insert/update/delete without any reference left such as foreign key reference from another table, they, their base classes and also their class_registry will be garbage collected, so no trace will be left.

like image 8
kirpit Avatar answered Oct 23 '22 22:10

kirpit