SQLAlchemy and empty IN clause

Question

I found out that SQLAlchemy translates

db.query(...).filter(A.id.in_(ids))

into

SELECT ...
FROM a
WHERE a.id != a.id

if ids is empty. That results in sequential scan on a table, which is obviously disastrous for performance.

The first question is: why? Why not just 1 = 0 or anything that does not require sequential scan?

The second, more important: is is there a commonly used workaround (other than if near every in_)?

I guess that in_ cannot be easily reimplemented to cover all cases without causing that problem, but I cannot be the first to face it and there may be some solution covering simple, common use cases of in_.

EDIT

SQLAlchemy logs a warning every time it happens:

"The IN-predicate on 'foo.bar' was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance."

qris · Accepted Answer

(This is now mainly of historical interest, as SQLAlchemy fixed this bug long ago).

To answer the OP's question of "why", here is the FAQ entry (which I always find surprisingly difficult to find):

Why does .col.in_([]) produce col != col? Why not 1=0?

A little introduction to the issue. The IN operator in SQL, given a list of elements to compare against a column, generally does not accept an empty list, that is while it is valid to say:
column IN (1, 2, 3)
it’s not valid to say:
column IN ()
SQLAlchemy's Operators.in_() operator, when given an empty list, produces this expression:
column != column
As of version 0.6, it also produces a warning stating that a less efficient comparison operation will be rendered. This expression is the only one that is both database agnostic and produces correct results.

For example, the naive approach of "just evaluate to false, by comparing 1=0 or 1!=1", does not handle nulls properly. An expression like:
NOT column != column
will not return a row when column IS NULL, but an expression which does not take the column into account, such as:
NOT 1=0
will return a row.

As shown in this post, you may be able to use the ANY function to avoid this, since it's syntactically valid even for an empty list (but not supported on SQLite apparently). It's probably faster for large lists too, since it does less string mangling to build the query.

The performance issue with the in_ operator has recently been fixed and the fix will probably be in SQLAlchemy 1.2.0.

TheGrimmScientist · Answer

I'm using:

if len(ids) > 0:
    db.query(...).where(A.id.in_(ids))
else:
    db.query(...).where(False)

I tried a .limit(0) instead of the .where(false) without success. There is some behind-the-scenes difference in the empty querysets that broke other stuff down the pipeline. This workaround, while could be faster, at least avoids your mentioned warning.

SQLAlchemy and empty IN clause

Tags:

python

sql

sqlalchemy

EDIT

Bartosz Marcinkowski

2 Answers

qris

TheGrimmScientist

Recent Activity

Donate For Us

SQLAlchemy and empty IN clause

Tags:

python

sql

sqlalchemy

EDIT

Bartosz Marcinkowski

2 Answers

qris

TheGrimmScientist

Related questions

Recent Activity

Donate For Us