Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building queries with a cross database join using SQLalchemy ORM and SQLite

Tags:

I have two SQLite databases containing tables I need to join using SQLalchemy. For reasons I can not combine all the tables into one SQLite database. I am using SQLalchemy ORM. I have not been able to find any solution online that meets my specific case.

My question is in principle the same as SQLAlchemy error query join across database, but the original poster's problem was solved using a different solution that does not match my use case.

My question to the wise people at Stackoverflow:

I want to emulate the following SQL query:

SELECT DISTINCT g.gene_symbol, o.orthofinder_id FROM eukarya.genes AS g JOIN annotations.orthofinder AS o ON g.gene_id=o.gene_id;

This query works fine using SQliteStudio having attached both database files.

The code I am currently using to describe the metadata:

eukarya_engine = create_engine('sqlite:///eukarya_db.sqlite3')
annotations_engine = create_engine('sqlite:///eukarya_annotations_db.sqlite3')

meta = MetaData()  # This allows me to define cross database foreign keys

Eukarya = declarative_base(bind=eukarya_engine, metadata=meta)
Annotations = declarative_base(bind=annotations_engine, metadata=meta)
# I did the above in the hopes that by binding the engines this way,
# would percolate through the schema, and sqlalchemy would be able
# figure out which engine to use for each table.

class Genes(Eukarya):
  """SQLalchemy object representing the Genes table in the Eukarya database."""
  __tablename__ = 'genes'
  gene_id = Column(Integer, primary_key=True, unique=True)
  gene_symbol = Column(String(16), index=True)
  taxonomy_id = Column(Integer, ForeignKey(Species.taxonomy_id), index=True)
  original_gene_id = Column(String)

class Orthofinder(Annotations):
  """SQLalchemy object representing the Orthofinder table in the Annotations database."""
  __tablename__ = 'orthofinder'
  id = Column(Integer,primary_key=True, autoincrement=True)
  gene_id = Column(Integer, ForeignKey(Genes.gene_id), index=True)
  orthofinder_id = Column(String(10), index=True)

Session = sessionmaker()
session = Session(bind=eukarya_engine)

print(session.query(Genes.gene_symbol,Orthofinder.orthofinder_id).
      join(Orthofinder).all().statement)

The last print statement returns:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: orthofinder [SQL: 'SELECT genes.gene_symbol AS genes_gene_symbol, orthofinder.orthofinder_id AS orthofinder_orthofinder_id \nFROM genes JOIN orthofinder ON genes.gene_id = orthofinder.gene_id']

I believe my troubles are over if I would somehow be able to bind both database engines to one session. But how? For two databases attached to the same engine (e.g. two databases in a MySQL database) I could add __table_args__ = {'schema': 'annotations'} (as per Cross database join in sqlalchemy), but I am unable to figure this one out in the case of SQLite.

I would prefer a solution that would allow users of my code to construct queries without having to know in which database each table resides.

Please help! And many thanks in advance!

like image 563
John van Dam Avatar asked Nov 03 '17 10:11

John van Dam


People also ask

Does SQLAlchemy work with SQLite?

The ORM provided by SQLAlchemy sits between the SQLite database and your Python program and transforms the data flow between the database engine and Python objects. SQLAlchemy allows you to think in terms of objects and still retain the powerful features of a database engine.

How does the querying work with SQLAlchemy?

All SELECT statements generated by SQLAlchemy ORM are constructed by Query object. It provides a generative interface, hence successive calls return a new Query object, a copy of the former with additional criteria and options associated with it.

Is SQLAlchemy faster than SQLite?

Interesting to note that querying using bare sqlite3 is still about 3 times faster than using SQLAlchemy Core. I guess that's the price you pay for having a ResultProxy returned instead of a bare sqlite3 row. SQLAlchemy Core is about 8 times faster than using ORM. So querying using ORM is a lot slower no matter what.

Should I use SQLAlchemy core or ORM?

If you want to view your data in a more schema-centric view (as used in SQL), use Core. If you have data for which business objects are not needed, use Core. If you view your data as business objects, use ORM. If you are building a quick prototype, use ORM.


1 Answers

Answer to my own question (thanks to Ilja for finding the solution):

I can define the engine like this:

engine  = create_engine('sqlite://',echo=True)  # generate in mem database to attach mutitple sqlite databases to.
engine.execute("attach database 'eukarya_db.sqlite3' as eukarya;")
engine.execute("attach database 'eukarya_annotations_db.sqlite3' as annotations;")

And then add

__table_args__ = {'schema': 'eukarya'}

or

__table_args__ = {'schema': 'annotations'}

To my table classes.

Works like a charm!

like image 132
John van Dam Avatar answered Sep 23 '22 12:09

John van Dam