Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy and empty IN clause

I found out that SQLAlchemy translates

db.query(...).filter(A.id.in_(ids))

into

SELECT ...
FROM a
WHERE a.id != a.id

if ids is empty. That results in sequential scan on a table, which is obviously disastrous for performance.

The first question is: why? Why not just 1 = 0 or anything that does not require sequential scan?

The second, more important: is is there a commonly used workaround (other than if near every in_)?

I guess that in_ cannot be easily reimplemented to cover all cases without causing that problem, but I cannot be the first to face it and there may be some solution covering simple, common use cases of in_.

EDIT

SQLAlchemy logs a warning every time it happens:

"The IN-predicate on 'foo.bar' was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance."

like image 455
Bartosz Marcinkowski Avatar asked Sep 11 '25 06:09

Bartosz Marcinkowski


2 Answers

(This is now mainly of historical interest, as SQLAlchemy fixed this bug long ago).

To answer the OP's question of "why", here is the FAQ entry (which I always find surprisingly difficult to find):

Why does .col.in_([]) produce col != col? Why not 1=0?

A little introduction to the issue. The IN operator in SQL, given a list of elements to compare against a column, generally does not accept an empty list, that is while it is valid to say:

column IN (1, 2, 3)

it’s not valid to say:

column IN ()

SQLAlchemy's Operators.in_() operator, when given an empty list, produces this expression:

column != column

As of version 0.6, it also produces a warning stating that a less efficient comparison operation will be rendered. This expression is the only one that is both database agnostic and produces correct results.

For example, the naive approach of "just evaluate to false, by comparing 1=0 or 1!=1", does not handle nulls properly. An expression like:

NOT column != column

will not return a row when column IS NULL, but an expression which does not take the column into account, such as:

NOT 1=0

will return a row.

As shown in this post, you may be able to use the ANY function to avoid this, since it's syntactically valid even for an empty list (but not supported on SQLite apparently). It's probably faster for large lists too, since it does less string mangling to build the query.

The performance issue with the in_ operator has recently been fixed and the fix will probably be in SQLAlchemy 1.2.0.

like image 121
qris Avatar answered Sep 13 '25 20:09

qris


I'm using:

if len(ids) > 0:
    db.query(...).where(A.id.in_(ids))
else:
    db.query(...).where(False)

I tried a .limit(0) instead of the .where(false) without success. There is some behind-the-scenes difference in the empty querysets that broke other stuff down the pipeline. This workaround, while could be faster, at least avoids your mentioned warning.

like image 43
TheGrimmScientist Avatar answered Sep 13 '25 20:09

TheGrimmScientist