sqlalchemy how to generate (many-to-many) relationships with automap_base

Tags:

As a background: I'm creating an ORM based on a schema of an already existing database. - This due to the fact that the python application won't be the "owner" of said database.

Now in this database there is a table called "task" and a table called "task_notBefore__task_relatedTasks" - this latter is a many-to-many relation between different entries in the "task" table.

now automap_base() has an automated detection of these relationships as described here. However this fails for my case, and no relationship is being build.

I then try to manually create the relationship:

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.ext.automap import generate_relationship
from sqlalchemy.orm import sessionmaker, interfaces, relationship
from sqlalchemy import create_engine

class DBConnection:
    def __init__(self, connection_url, **kwargs):
        self.engine = create_engine(connection_url, **kwargs)
        self._Base = automap_base()

        self._Base.prepare(self.engine, reflect=True)

        self.Task = self._Base.classes.task
        self.Order = self._Base.classes.order
        self.Poller = self._Base.classes.poller

        rel = generate_relationship(self._Base, interfaces.MANYTOMANY, relationship, 'related', self.Task, self.Task,
                                    secondary=self._Base.classes.task_notBefore__task_relatedTasks, backref='notBefore')

        self._Session = sessionmaker()
        self._Session.configure(bind=self.engine)

        self.session = self._Session()

However this still doesn't "do" anything: it doesn't add anything to the self.Task "class".

How would one do this?

355

asked Mar 01 '18 15:03

paul23

1 Answers

The primary problem in this case is not just the many-to-many relationship, but the fact that it's a self-referential, many-to-many relationship. Because automap is simply translating the mapped class names to relationship names, it constructs the same name, e.g. task_collection, for both directions of the relationship, and the naming collision generates the error. This shortcoming of automap feels significant in that self-referential, many-to-many relationships are not uncommon.

Explicitly adding the relationships you want, using your own names, won't solve the problem because automap will still try to create the task_collection relationships. To deal with this issue, we need to override task_collection.

If you're okay with keeping the name task_collection for the forward direction of the relationship, we can simply pre-define the relationship--specifying whatever name we want for the backref. If automap finds the expected property already in place, it will assume the relationship is being overridden and not try to add it.

Here's a stripped down example, along with the an sqlite database for testing.

Sqlite Database

CREATE TABLE task (
    id INTEGER, 
    name VARCHAR,
    PRIMARY KEY (id)
);

CREATE TABLE task_task (
    tid1 INTEGER,
    tid2 INTEGER,
    FOREIGN KEY(tid1) REFERENCES task(id),
    FOREIGN KEY(tid2) REFERENCES task(id)
);

-- Some sample data
INSERT INTO task VALUES (0, 'task_0');
INSERT INTO task VALUES (1, 'task_1');
INSERT INTO task VALUES (2, 'task_2');
INSERT INTO task VALUES (3, 'task_3');
INSERT INTO task VALUES (4, 'task_4');

INSERT INTO task_task VALUES (0, 1);
INSERT INTO task_task VALUES (0, 2);

INSERT INTO task_task VALUES (2, 4);
INSERT INTO task_task VALUES (3, 4);

INSERT INTO task_task VALUES (3, 0);

Putting it into a file called setup_self.sql, we can do:

sqlite3 self.db < setup_self.sql

Python Code

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine

from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

DeclBase = declarative_base()

task_task = Table('task_task', DeclBase.metadata,
                  Column('tid1', Integer, ForeignKey('task.id')),
                  Column('tid2', Integer, ForeignKey('task.id')))

Base = automap_base(DeclBase)

class Task(Base):
    __tablename__ = 'task'

    task_collection = relationship('Task', 
                                   secondary=task_task, 
                                   primaryjoin='Task.id==task_task.c.tid1',
                                   secondaryjoin='Task.id==task_task.c.tid2',
                                   backref='backward')

engine = create_engine("sqlite:///self.db")

Base.prepare(engine, reflect=True)

session = Session(engine)

task_0 = session.query(Task).filter_by(name ='task_0').first()
task_4 = session.query(Task).filter_by(name ='task_4').first()

print("task_0.task_collection = {}".format([x.name for x in task_0.task_collection]))
print("task_4.backward        = {}".format([x.name for x in task_4.backward]))

Results

task_0.task_collection = ['task_1', 'task_2']
task_4.backward        = ['task_2', 'task_3']

Using a Different Name

If you want to have a name other than task_collection, you need to use automap's function for overriding collection-relationship names:

name_for_collection_relationship(base, local_cls, referred_cls, constraint)

The arguments local_cls and referred_cls are instances of the mapped table classes. For a self-referential, many-to-many relationship, these are both the same class. We can use the arguments to build a key that allows us to identify overrides.

Here is an example implementation of this approach.

from sqlalchemy.ext.automap import automap_base, name_for_collection_relationship
from sqlalchemy.orm import Session
from sqlalchemy import create_engine

from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base


DeclBase = declarative_base()

task_task = Table('task_task', DeclBase.metadata,
                  Column('tid1', Integer, ForeignKey('task.id')),
                  Column('tid2', Integer, ForeignKey('task.id')))

Base = automap_base(DeclBase)

class Task(Base):
    __tablename__ = 'task'

    forward = relationship('Task', 
                           secondary=task_task, 
                           primaryjoin='Task.id==task_task.c.tid1',
                           secondaryjoin='Task.id==task_task.c.tid2',
                           backref='backward')


# A dictionary that maps relationship keys to a method name
OVERRIDES = {
    'Task_Task' : 'forward'
    }

def _name_for_collection_relationship(base, local_cls, referred_cls, constraint):

    # Build the key
    key = '{}_{}'.format(local_cls.__name__, referred_cls.__name__)

    # Did we have an override name?
    if key in OVERRIDES:
        # Yes, return it
        return OVERRIDES[key]

    # Default to the standard automap function
    return name_for_collection_relationship(base, local_cls, referred_cls, constraint)


engine = create_engine("sqlite:///self.db")

Base.prepare(engine, reflect=True, name_for_collection_relationship=_name_for_collection_relationship)

Note that the overriding of name_for_collection_relationship simply changes the name that automap uses for the relationship. In our case, the relationship is still being pre-defined by Task. But, the override tells automap to look for forward instead of task_collection, which it finds and therefore discontinues defining the relationship.

Other Approaches Considered

Under some circumstances, it would be nice if we could override the relationship names without having to pre-define the actual relationship. On first consideration, this should be possible using name_for_collection_relationship. However, I could not get this approach to work for self-referential, many-to-many relationships, due to a combination of two reasons.

name_for_collection_relationship and the related generate_relationship are called twice, once for each direction of the many-to-many relationship. In both cases, local_cls and referred_cls are the same, because of the self-referentiality. Moreover, the other arguments of name_for_collection_relationship are effectively equivalent. Therefore, we cannot, from the context of the function call, determine which direction we are overriding.
Here is the even-more surprising part of the problem. It appears we cannot even count on one direction happening before the other. In other words, the two calls to name_for_collection_relationship and generate_relationship are very similar. The argument that actually determines the directionality of the relationship is constraint, which is one of the two foreign-key constraints for the relationship; these constraints are loaded, from Base.metadata, into a variable called m2m_const. Herein lies the problem. The order that the constraints end up in m2m_const is nondeterministic, i.e. sometimes it will be one order; other times it will be the opposite (at least when using sqlite3). Because of this, the directionality of the relationship is nondeterministic.

On the other hand, when we pre-define the relationship, the following arguments create the necessary determinism.

primaryjoin='Task.id==task_task.c.tid1',
secondaryjoin='Task.id==task_task.c.tid2',

Of particular note, I actually tried to create a solution that simply overrode the relationship names without pre-defining it. It exhibited the described nondeterminism.

Final Thoughts

If you have a reasonable number of database tables that do not change often, I would suggest just using Declarative Base. It might be a little more work to set up, but it gives you more control.

answered Oct 15 '22 12:10

cryptoplex

Related questions
                            
                                How do I combine a timezone aware date and time in Python?
                            
                                Install PIL in Ubuntu 12.04 Python 2.7 and Python 3.2
                            
                                How to debug Django app running on Heroku using a remote pdb connection?
                            
                                ipython ipdb, when invoked via ipdb.set_trace(), does not remember the command history while debugging
                            
                                sqlalchemy validator for two fields
                            
                                Cython VS C++ Performance Comparison? [closed]
                            
                                How to document fortran function for f2py?
                            
                                Regarding installing SciPy from PyCharm
                            
                                Validation on query_params in Django Rest Framework
                            
                                numpy array 1.9.2 getting ValueError: could not broadcast input array from shape (4,2) into shape (4)
                            
                                Manually calling spark's garbage collection from pyspark
                            
                                Celery restart loss scheduled tasks
                            
                                Detecting comic strip dialogue bubble regions in images
                            
                                Since Tuples are immutable, why does slicing them make a copy instead of a view?
                            
                                Why doesn't except object catch everything in Python?
                            
                                Require login in a Django Channels socket?
                            
                                How To Format Email to Send as SMS
                            
                                Fatal Python error when using a dynamic version of Python to execute embedded python code
                            
                                How to do multiprocessing using Python for .NET on Windows?
                            
                                Graphviz: Make all nodes the same size as the largest

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With