I have two SQLite databases containing tables I need to join using SQLalchemy. For reasons I can not combine all the tables into one SQLite database. I am using SQLalchemy ORM. I have not been able to find any solution online that meets my specific case.
My question is in principle the same as SQLAlchemy error query join across database, but the original poster's problem was solved using a different solution that does not match my use case.
My question to the wise people at Stackoverflow:
I want to emulate the following SQL query:
SELECT DISTINCT g.gene_symbol, o.orthofinder_id FROM eukarya.genes AS g JOIN annotations.orthofinder AS o ON g.gene_id=o.gene_id;
This query works fine using SQliteStudio having attached both database files.
The code I am currently using to describe the metadata:
eukarya_engine = create_engine('sqlite:///eukarya_db.sqlite3')
annotations_engine = create_engine('sqlite:///eukarya_annotations_db.sqlite3')
meta = MetaData() # This allows me to define cross database foreign keys
Eukarya = declarative_base(bind=eukarya_engine, metadata=meta)
Annotations = declarative_base(bind=annotations_engine, metadata=meta)
# I did the above in the hopes that by binding the engines this way,
# would percolate through the schema, and sqlalchemy would be able
# figure out which engine to use for each table.
class Genes(Eukarya):
"""SQLalchemy object representing the Genes table in the Eukarya database."""
__tablename__ = 'genes'
gene_id = Column(Integer, primary_key=True, unique=True)
gene_symbol = Column(String(16), index=True)
taxonomy_id = Column(Integer, ForeignKey(Species.taxonomy_id), index=True)
original_gene_id = Column(String)
class Orthofinder(Annotations):
"""SQLalchemy object representing the Orthofinder table in the Annotations database."""
__tablename__ = 'orthofinder'
id = Column(Integer,primary_key=True, autoincrement=True)
gene_id = Column(Integer, ForeignKey(Genes.gene_id), index=True)
orthofinder_id = Column(String(10), index=True)
Session = sessionmaker()
session = Session(bind=eukarya_engine)
print(session.query(Genes.gene_symbol,Orthofinder.orthofinder_id).
join(Orthofinder).all().statement)
The last print statement returns:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: orthofinder [SQL: 'SELECT genes.gene_symbol AS genes_gene_symbol, orthofinder.orthofinder_id AS orthofinder_orthofinder_id \nFROM genes JOIN orthofinder ON genes.gene_id = orthofinder.gene_id']
I believe my troubles are over if I would somehow be able to bind both database engines to one session. But how?
For two databases attached to the same engine (e.g. two databases in a MySQL database) I could add __table_args__ = {'schema': 'annotations'}
(as per Cross database join in sqlalchemy), but I am unable to figure this one out in the case of SQLite.
I would prefer a solution that would allow users of my code to construct queries without having to know in which database each table resides.
Please help! And many thanks in advance!
The ORM provided by SQLAlchemy sits between the SQLite database and your Python program and transforms the data flow between the database engine and Python objects. SQLAlchemy allows you to think in terms of objects and still retain the powerful features of a database engine.
All SELECT statements generated by SQLAlchemy ORM are constructed by Query object. It provides a generative interface, hence successive calls return a new Query object, a copy of the former with additional criteria and options associated with it.
Interesting to note that querying using bare sqlite3 is still about 3 times faster than using SQLAlchemy Core. I guess that's the price you pay for having a ResultProxy returned instead of a bare sqlite3 row. SQLAlchemy Core is about 8 times faster than using ORM. So querying using ORM is a lot slower no matter what.
If you want to view your data in a more schema-centric view (as used in SQL), use Core. If you have data for which business objects are not needed, use Core. If you view your data as business objects, use ORM. If you are building a quick prototype, use ORM.
Answer to my own question (thanks to Ilja for finding the solution):
I can define the engine like this:
engine = create_engine('sqlite://',echo=True) # generate in mem database to attach mutitple sqlite databases to.
engine.execute("attach database 'eukarya_db.sqlite3' as eukarya;")
engine.execute("attach database 'eukarya_annotations_db.sqlite3' as annotations;")
And then add
__table_args__ = {'schema': 'eukarya'}
or
__table_args__ = {'schema': 'annotations'}
To my table classes.
Works like a charm!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With