As a background: I'm creating an ORM based on a schema of an already existing database. - This due to the fact that the python application won't be the "owner" of said database.
Now in this database there is a table called "task
" and a table called "task_notBefore__task_relatedTasks
" - this latter is a many-to-many relation between different entries in the "task
" table.
now automap_base()
has an automated detection of these relationships as described here. However this fails for my case, and no relationship is being build.
I then try to manually create the relationship:
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.ext.automap import generate_relationship
from sqlalchemy.orm import sessionmaker, interfaces, relationship
from sqlalchemy import create_engine
class DBConnection:
def __init__(self, connection_url, **kwargs):
self.engine = create_engine(connection_url, **kwargs)
self._Base = automap_base()
self._Base.prepare(self.engine, reflect=True)
self.Task = self._Base.classes.task
self.Order = self._Base.classes.order
self.Poller = self._Base.classes.poller
rel = generate_relationship(self._Base, interfaces.MANYTOMANY, relationship, 'related', self.Task, self.Task,
secondary=self._Base.classes.task_notBefore__task_relatedTasks, backref='notBefore')
self._Session = sessionmaker()
self._Session.configure(bind=self.engine)
self.session = self._Session()
However this still doesn't "do" anything: it doesn't add anything to the self.Task
"class".
How would one do this?
Python Flask and SQLAlchemy ORM Many to Many relationship between two tables is achieved by adding an association table such that it has two foreign keys - one from each table's primary key.
Define an extension to the sqlalchemy. ext. declarative system which automatically generates mapped classes and relationships from a database schema, typically though not necessarily one which is reflected.
The back_populates argument tells SqlAlchemy which column to link with when it joins the two tables. It allows you to access the linked records as a list with something like Parent.
The declarative_base() base class contains a MetaData object where newly defined Table objects are collected. This object is intended to be accessed directly for MetaData -specific operations. Such as, to issue CREATE statements for all tables: engine = create_engine('sqlite://') Base.
The primary problem in this case is not just the many-to-many relationship, but the fact that it's a self-referential, many-to-many relationship. Because automap
is simply translating the mapped class names to relationship names, it constructs the same name, e.g. task_collection
, for both directions of the relationship, and the naming collision generates the error. This shortcoming of automap
feels significant in that self-referential, many-to-many relationships are not uncommon.
Explicitly adding the relationships you want, using your own names, won't solve the problem because automap
will still try to create the task_collection
relationships. To deal with this issue, we need to override task_collection
.
If you're okay with keeping the name task_collection
for the forward direction of the relationship, we can simply pre-define the relationship--specifying whatever name we want for the backref
. If automap
finds the expected property already in place, it will assume the relationship is being overridden and not try to add it.
Here's a stripped down example, along with the an sqlite
database for testing.
CREATE TABLE task (
id INTEGER,
name VARCHAR,
PRIMARY KEY (id)
);
CREATE TABLE task_task (
tid1 INTEGER,
tid2 INTEGER,
FOREIGN KEY(tid1) REFERENCES task(id),
FOREIGN KEY(tid2) REFERENCES task(id)
);
-- Some sample data
INSERT INTO task VALUES (0, 'task_0');
INSERT INTO task VALUES (1, 'task_1');
INSERT INTO task VALUES (2, 'task_2');
INSERT INTO task VALUES (3, 'task_3');
INSERT INTO task VALUES (4, 'task_4');
INSERT INTO task_task VALUES (0, 1);
INSERT INTO task_task VALUES (0, 2);
INSERT INTO task_task VALUES (2, 4);
INSERT INTO task_task VALUES (3, 4);
INSERT INTO task_task VALUES (3, 0);
Putting it into a file called setup_self.sql
, we can do:
sqlite3 self.db < setup_self.sql
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
DeclBase = declarative_base()
task_task = Table('task_task', DeclBase.metadata,
Column('tid1', Integer, ForeignKey('task.id')),
Column('tid2', Integer, ForeignKey('task.id')))
Base = automap_base(DeclBase)
class Task(Base):
__tablename__ = 'task'
task_collection = relationship('Task',
secondary=task_task,
primaryjoin='Task.id==task_task.c.tid1',
secondaryjoin='Task.id==task_task.c.tid2',
backref='backward')
engine = create_engine("sqlite:///self.db")
Base.prepare(engine, reflect=True)
session = Session(engine)
task_0 = session.query(Task).filter_by(name ='task_0').first()
task_4 = session.query(Task).filter_by(name ='task_4').first()
print("task_0.task_collection = {}".format([x.name for x in task_0.task_collection]))
print("task_4.backward = {}".format([x.name for x in task_4.backward]))
task_0.task_collection = ['task_1', 'task_2']
task_4.backward = ['task_2', 'task_3']
If you want to have a name other than task_collection
, you need to use automap
's function for overriding collection-relationship names:
name_for_collection_relationship(base, local_cls, referred_cls, constraint)
The arguments local_cls
and referred_cls
are instances of the mapped table classes. For a self-referential, many-to-many relationship, these are both the same class. We can use the arguments to build a key that allows us to identify overrides.
Here is an example implementation of this approach.
from sqlalchemy.ext.automap import automap_base, name_for_collection_relationship
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
DeclBase = declarative_base()
task_task = Table('task_task', DeclBase.metadata,
Column('tid1', Integer, ForeignKey('task.id')),
Column('tid2', Integer, ForeignKey('task.id')))
Base = automap_base(DeclBase)
class Task(Base):
__tablename__ = 'task'
forward = relationship('Task',
secondary=task_task,
primaryjoin='Task.id==task_task.c.tid1',
secondaryjoin='Task.id==task_task.c.tid2',
backref='backward')
# A dictionary that maps relationship keys to a method name
OVERRIDES = {
'Task_Task' : 'forward'
}
def _name_for_collection_relationship(base, local_cls, referred_cls, constraint):
# Build the key
key = '{}_{}'.format(local_cls.__name__, referred_cls.__name__)
# Did we have an override name?
if key in OVERRIDES:
# Yes, return it
return OVERRIDES[key]
# Default to the standard automap function
return name_for_collection_relationship(base, local_cls, referred_cls, constraint)
engine = create_engine("sqlite:///self.db")
Base.prepare(engine, reflect=True, name_for_collection_relationship=_name_for_collection_relationship)
Note that the overriding of name_for_collection_relationship
simply changes the name that automap
uses for the relationship. In our case, the relationship is still being pre-defined by Task
. But, the override tells automap
to look for forward
instead of task_collection
, which it finds and therefore discontinues defining the relationship.
Under some circumstances, it would be nice if we could override the relationship names without having to pre-define the actual relationship. On first consideration, this should be possible using name_for_collection_relationship
. However, I could not get this approach to work for self-referential, many-to-many relationships, due to a combination of two reasons.
name_for_collection_relationship
and the related generate_relationship
are called twice, once for each direction of the many-to-many relationship. In both cases, local_cls
and referred_cls
are the same, because of the self-referentiality. Moreover, the other arguments of name_for_collection_relationship
are effectively equivalent. Therefore, we cannot, from the context of the function call, determine which direction we are overriding.
Here is the even-more surprising part of the problem. It appears we cannot even count on one direction happening before the other. In other words, the two calls to name_for_collection_relationship
and generate_relationship
are very similar. The argument that actually determines the directionality of the relationship is constraint
, which is one of the two foreign-key constraints for the relationship; these constraints are loaded, from Base.metadata
, into a variable called m2m_const
. Herein lies the problem. The order that the constraints end up in m2m_const
is nondeterministic, i.e. sometimes it will be one order; other times it will be the opposite (at least when using sqlite3
). Because of this, the directionality of the relationship is nondeterministic.
On the other hand, when we pre-define the relationship, the following arguments create the necessary determinism.
primaryjoin='Task.id==task_task.c.tid1',
secondaryjoin='Task.id==task_task.c.tid2',
Of particular note, I actually tried to create a solution that simply overrode the relationship names without pre-defining it. It exhibited the described nondeterminism.
If you have a reasonable number of database tables that do not change often, I would suggest just using Declarative Base. It might be a little more work to set up, but it gives you more control.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With