Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Star schema in SQLAlchemy

I have a star-schema architectured database that I want to represent in SQLAlchemy. Now I have the problem on how this can be done in the best possible way. Right now I have a lot of properties with custom join conditions, because the data is stored in different tables. It would be nice if it would be possible to re-use the dimensions for different fact tablesw but I haven't figured out how that can be done nicely.

like image 465
Joakim Lundborg Avatar asked Sep 16 '09 23:09

Joakim Lundborg


People also ask

What is schema in SQLAlchemy?

SQLAlchemy schema metadata is a comprehensive system of describing and inspecting database schemas. The core of SQLAlchemy's query and object mapping operations is supported by database metadata.

What is the use of MetaData in SQLAlchemy?

Metadata contains definitions of tables and associated objects such as index, view, triggers, etc. Hence an object of MetaData class from SQLAlchemy Metadata is a collection of Table objects and their associated schema constructs.

What is MetaData in flask SQLAlchemy?

MetaData is a container object that keeps together many different features of a database (or multiple databases) being described. To represent a table, use the Table class. Its two primary arguments are the table name, then the MetaData object which it will be associated with.

What is declarative base in SQLAlchemy?

The Declarative system is the typically used system provided by the SQLAlchemy ORM in order to define classes mapped to relational database tables. However, as noted in Classical Mappings, Declarative is in fact a series of extensions that ride on top of the SQLAlchemy mapper() construct.


1 Answers

A typical fact table in a star schema contains foreign key references to all dimension tables, so usually there wouldn't be any need for custom join conditions - they are determined automatically from foreign key references.

For example a star schema with two fact tables would look like:

Base = declarative_meta()

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

class Product(Base):
    __tablename__ = 'product'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

class FactOne(Base):
    __tablename__ = 'sales_fact_one'

    store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
    product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
    units_sold = Column('units_sold', Integer, nullable=False)

    store = relation(Store)
    product = relation(Product)

class FactTwo(Base):
    __tablename__ = 'sales_fact_two'

    store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
    product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
    units_sold = Column('units_sold', Integer, nullable=False)

    store = relation(Store)
    product = relation(Product)

But suppose you want to reduce the boilerplate in any case. I'd create generators local to the dimension classes which configure themselves on a fact table:

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def add_dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls)

in which case usage would be like:

class FactOne(Base):
    ...

Store.add_dimension(FactOne)

But, there's a problem with that. Assuming the dimension columns you're adding are primary key columns, the mapper configuration is going to fail since a class needs to have its primary keys set up before the mapping is set up. So assuming we're using declarative (which you'll see below has a nice effect), to make this approach work we'd have to use the instrument_declarative() function instead of the standard metaclass:

meta = MetaData()
registry = {}
def register_cls(*cls):
    for c in cls:
        instrument_declarative(c, registry, meta)

So then we'd do something along the lines of:

class Store(object):
    # ...

class FactOne(object):
    __tablename__ = 'sales_fact_one'

Store.add_dimension(FactOne)

register_cls(Store, FactOne)

If you actually have a good reason for custom join conditions, as long as there's some pattern to how those conditions are created, you can generate that with your add_dimension():

class Store(object):
    ...

    @classmethod
    def add_dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls, primaryjoin=target.store_id==cls.id)

But the final cool thing if you're on 2.6, is to turn add_dimension into a class decorator. Here's an example with everything cleaned up:

from sqlalchemy import *
from sqlalchemy.ext.declarative import instrument_declarative
from sqlalchemy.orm import *

class BaseMeta(type):
    classes = set()
    def __init__(cls, classname, bases, dict_):
        klass = type.__init__(cls, classname, bases, dict_)
        if 'metadata' not in dict_:
            BaseMeta.classes.add(cls)
        return klass

class Base(object):
    __metaclass__ = BaseMeta
    metadata = MetaData()
    def __init__(self, **kw):
        for k in kw:
            setattr(self, k, kw[k])

    @classmethod
    def configure(cls, *klasses):
        registry = {}
        for c in BaseMeta.classes:
            instrument_declarative(c, registry, cls.metadata)

class Store(Base):
    __tablename__ = 'store'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def dimension(cls, target):
        target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
        target.store = relation(cls)
        return target

class Product(Base):
    __tablename__ = 'product'

    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50), nullable=False)

    @classmethod
    def dimension(cls, target):
        target.product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
        target.product = relation(cls)
        return target

@Store.dimension
@Product.dimension
class FactOne(Base):
    __tablename__ = 'sales_fact_one'

    units_sold = Column('units_sold', Integer, nullable=False)

@Store.dimension
@Product.dimension
class FactTwo(Base):
    __tablename__ = 'sales_fact_two'

    units_sold = Column('units_sold', Integer, nullable=False)

Base.configure()

if __name__ == '__main__':
    engine = create_engine('sqlite://', echo=True)
    Base.metadata.create_all(engine)

    sess = sessionmaker(engine)()

    sess.add(FactOne(store=Store(name='s1'), product=Product(name='p1'), units_sold=27))
    sess.commit()
like image 128
zzzeek Avatar answered Sep 19 '22 16:09

zzzeek