Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sqlalchemy how to generate (many-to-many) relationships with automap_base

As a background: I'm creating an ORM based on a schema of an already existing database. - This due to the fact that the python application won't be the "owner" of said database.

Now in this database there is a table called "task" and a table called "task_notBefore__task_relatedTasks" - this latter is a many-to-many relation between different entries in the "task" table.

now automap_base() has an automated detection of these relationships as described here. However this fails for my case, and no relationship is being build.

I then try to manually create the relationship:

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.ext.automap import generate_relationship
from sqlalchemy.orm import sessionmaker, interfaces, relationship
from sqlalchemy import create_engine

class DBConnection:
    def __init__(self, connection_url, **kwargs):
        self.engine = create_engine(connection_url, **kwargs)
        self._Base = automap_base()

        self._Base.prepare(self.engine, reflect=True)

        self.Task = self._Base.classes.task
        self.Order = self._Base.classes.order
        self.Poller = self._Base.classes.poller

        rel = generate_relationship(self._Base, interfaces.MANYTOMANY, relationship, 'related', self.Task, self.Task,
                                    secondary=self._Base.classes.task_notBefore__task_relatedTasks, backref='notBefore')

        self._Session = sessionmaker()
        self._Session.configure(bind=self.engine)

        self.session = self._Session()

However this still doesn't "do" anything: it doesn't add anything to the self.Task "class".

How would one do this?

like image 355
paul23 Avatar asked Mar 01 '18 15:03

paul23


People also ask

How do you create a many to many relationship in SQLAlchemy?

Python Flask and SQLAlchemy ORM Many to Many relationship between two tables is achieved by adding an association table such that it has two foreign keys - one from each table's primary key.

What is Automap in SQLAlchemy?

Define an extension to the sqlalchemy. ext. declarative system which automatically generates mapped classes and relationships from a database schema, typically though not necessarily one which is reflected.

What is Backpopulate in SQLAlchemy?

The back_populates argument tells SqlAlchemy which column to link with when it joins the two tables. It allows you to access the linked records as a list with something like Parent.

What is declarative_base?

The declarative_base() base class contains a MetaData object where newly defined Table objects are collected. This object is intended to be accessed directly for MetaData -specific operations. Such as, to issue CREATE statements for all tables: engine = create_engine('sqlite://') Base.


1 Answers

The primary problem in this case is not just the many-to-many relationship, but the fact that it's a self-referential, many-to-many relationship. Because automap is simply translating the mapped class names to relationship names, it constructs the same name, e.g. task_collection, for both directions of the relationship, and the naming collision generates the error. This shortcoming of automap feels significant in that self-referential, many-to-many relationships are not uncommon.

Explicitly adding the relationships you want, using your own names, won't solve the problem because automap will still try to create the task_collection relationships. To deal with this issue, we need to override task_collection.

If you're okay with keeping the name task_collection for the forward direction of the relationship, we can simply pre-define the relationship--specifying whatever name we want for the backref. If automap finds the expected property already in place, it will assume the relationship is being overridden and not try to add it.

Here's a stripped down example, along with the an sqlite database for testing.

Sqlite Database

CREATE TABLE task (
    id INTEGER, 
    name VARCHAR,
    PRIMARY KEY (id)
);

CREATE TABLE task_task (
    tid1 INTEGER,
    tid2 INTEGER,
    FOREIGN KEY(tid1) REFERENCES task(id),
    FOREIGN KEY(tid2) REFERENCES task(id)
);

-- Some sample data
INSERT INTO task VALUES (0, 'task_0');
INSERT INTO task VALUES (1, 'task_1');
INSERT INTO task VALUES (2, 'task_2');
INSERT INTO task VALUES (3, 'task_3');
INSERT INTO task VALUES (4, 'task_4');

INSERT INTO task_task VALUES (0, 1);
INSERT INTO task_task VALUES (0, 2);

INSERT INTO task_task VALUES (2, 4);
INSERT INTO task_task VALUES (3, 4);

INSERT INTO task_task VALUES (3, 0);

Putting it into a file called setup_self.sql, we can do:

sqlite3 self.db < setup_self.sql

Python Code

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine

from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

DeclBase = declarative_base()

task_task = Table('task_task', DeclBase.metadata,
                  Column('tid1', Integer, ForeignKey('task.id')),
                  Column('tid2', Integer, ForeignKey('task.id')))

Base = automap_base(DeclBase)

class Task(Base):
    __tablename__ = 'task'

    task_collection = relationship('Task', 
                                   secondary=task_task, 
                                   primaryjoin='Task.id==task_task.c.tid1',
                                   secondaryjoin='Task.id==task_task.c.tid2',
                                   backref='backward')

engine = create_engine("sqlite:///self.db")

Base.prepare(engine, reflect=True)

session = Session(engine)

task_0 = session.query(Task).filter_by(name ='task_0').first()
task_4 = session.query(Task).filter_by(name ='task_4').first()

print("task_0.task_collection = {}".format([x.name for x in task_0.task_collection]))
print("task_4.backward        = {}".format([x.name for x in task_4.backward]))

Results

task_0.task_collection = ['task_1', 'task_2']
task_4.backward        = ['task_2', 'task_3']

Using a Different Name

If you want to have a name other than task_collection, you need to use automap's function for overriding collection-relationship names:

name_for_collection_relationship(base, local_cls, referred_cls, constraint)

The arguments local_cls and referred_cls are instances of the mapped table classes. For a self-referential, many-to-many relationship, these are both the same class. We can use the arguments to build a key that allows us to identify overrides.

Here is an example implementation of this approach.

from sqlalchemy.ext.automap import automap_base, name_for_collection_relationship
from sqlalchemy.orm import Session
from sqlalchemy import create_engine

from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base


DeclBase = declarative_base()

task_task = Table('task_task', DeclBase.metadata,
                  Column('tid1', Integer, ForeignKey('task.id')),
                  Column('tid2', Integer, ForeignKey('task.id')))

Base = automap_base(DeclBase)

class Task(Base):
    __tablename__ = 'task'

    forward = relationship('Task', 
                           secondary=task_task, 
                           primaryjoin='Task.id==task_task.c.tid1',
                           secondaryjoin='Task.id==task_task.c.tid2',
                           backref='backward')


# A dictionary that maps relationship keys to a method name
OVERRIDES = {
    'Task_Task' : 'forward'
    }

def _name_for_collection_relationship(base, local_cls, referred_cls, constraint):

    # Build the key
    key = '{}_{}'.format(local_cls.__name__, referred_cls.__name__)

    # Did we have an override name?
    if key in OVERRIDES:
        # Yes, return it
        return OVERRIDES[key]

    # Default to the standard automap function
    return name_for_collection_relationship(base, local_cls, referred_cls, constraint)


engine = create_engine("sqlite:///self.db")

Base.prepare(engine, reflect=True, name_for_collection_relationship=_name_for_collection_relationship)

Note that the overriding of name_for_collection_relationship simply changes the name that automap uses for the relationship. In our case, the relationship is still being pre-defined by Task. But, the override tells automap to look for forward instead of task_collection, which it finds and therefore discontinues defining the relationship.

Other Approaches Considered

Under some circumstances, it would be nice if we could override the relationship names without having to pre-define the actual relationship. On first consideration, this should be possible using name_for_collection_relationship. However, I could not get this approach to work for self-referential, many-to-many relationships, due to a combination of two reasons.

  • name_for_collection_relationship and the related generate_relationship are called twice, once for each direction of the many-to-many relationship. In both cases, local_cls and referred_cls are the same, because of the self-referentiality. Moreover, the other arguments of name_for_collection_relationship are effectively equivalent. Therefore, we cannot, from the context of the function call, determine which direction we are overriding.

  • Here is the even-more surprising part of the problem. It appears we cannot even count on one direction happening before the other. In other words, the two calls to name_for_collection_relationship and generate_relationship are very similar. The argument that actually determines the directionality of the relationship is constraint, which is one of the two foreign-key constraints for the relationship; these constraints are loaded, from Base.metadata, into a variable called m2m_const. Herein lies the problem. The order that the constraints end up in m2m_const is nondeterministic, i.e. sometimes it will be one order; other times it will be the opposite (at least when using sqlite3). Because of this, the directionality of the relationship is nondeterministic.

On the other hand, when we pre-define the relationship, the following arguments create the necessary determinism.

primaryjoin='Task.id==task_task.c.tid1',
secondaryjoin='Task.id==task_task.c.tid2',

Of particular note, I actually tried to create a solution that simply overrode the relationship names without pre-defining it. It exhibited the described nondeterminism.

Final Thoughts

If you have a reasonable number of database tables that do not change often, I would suggest just using Declarative Base. It might be a little more work to set up, but it gives you more control.

like image 81
cryptoplex Avatar answered Oct 15 '22 12:10

cryptoplex