Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning distinct rows in SQLAlchemy with SQLite

SQLAlchemy's Query.distinct method is behaving inconsistently:

>>> [tag.name for tag in session.query(Tag).all()] [u'Male', u'Male', u'Ninja', u'Pirate'] >>> session.query(Tag).distinct(Tag.name).count() 4 >>> session.query(Tag.name).distinct().count() 3 

So the second form gives the correct result but the first form does not. This appears to happen with SQLite but NOT with Postgres. I have a function which is passed a query object to have a distinct clause applied to it, so it would be highly difficult to rewrite everything top use the second approach above. Is there something obvious that I'm missing?

like image 450
Eli Courtwright Avatar asked Jun 20 '13 20:06

Eli Courtwright


People also ask

Is SQLAlchemy faster than SQLite?

Interesting to note that querying using bare sqlite3 is still about 3 times faster than using SQLAlchemy Core. I guess that's the price you pay for having a ResultProxy returned instead of a bare sqlite3 row. SQLAlchemy Core is about 8 times faster than using ORM. So querying using ORM is a lot slower no matter what.

Is SQLAlchemy the same as SQLite?

Sqlite is a database storage engine, which can be better compared with things such as MySQL, PostgreSQL, Oracle, MSSQL, etc. It is used to store and retrieve structured data from files. SQLAlchemy is a Python library that provides an object relational mapper (ORM).

What is returned by SQLAlchemy query?

It returns an instance based on the given primary key identifier providing direct access to the identity map of the owning Session. It creates a SQL JOIN against this Query object's criterion and apply generatively, returning the newly resulting Query. It returns exactly one result or raise an exception.


2 Answers

According to the docs:

When present, the Postgresql dialect will render a DISTINCT ON (>) construct.

So, passing column expressions to distinct() works for PostgreSQL only (because there is DISTINCT ON).

In the expression session.query(Tag).distinct(Tag.name).count() sqlalchemy ignores Tag.name and produces the query (distinct on all fields):

SELECT DISTINCT tag.country_id AS tag_country_id, tag.name AS tag_name  FROM tag 

As you said, in your case distinct(Tag.name) is applied - so instead of just count() consider using this:

session.query(Tag).distinct(Tag.name).group_by(Tag.name).count() 

Hope that helps.

like image 163
alecxe Avatar answered Sep 30 '22 12:09

alecxe


When you use session.query(Tag) you alway query for the whole Tag object, so if your table contains other columns it won't work.

Let's assume there is an id column, then the query

sess.query(Tag).distinct(Tag.name) 

will produce:

SELECT DISTINCT tag.id AS tag_id, tag.name AS tag_name FROM tag 

The argument to the distinct clause is ignored completely.

If you really only want the distinct names from the table, you must explicitly select only the names:

sess.query(Tag.name).distinct() 

produces:

SELECT DISTINCT tag.name AS tag_name FROM tag 
like image 32
mata Avatar answered Sep 30 '22 14:09

mata