The default SQLAlchemy behavior for compiling in_
expressions is pathological for very large lists, and I want to create a custom, faster, compiler for the operator. It doesn't matter to the application if the solution is a new operator (i.e.: in_list_
) or if it overrides the default compiler for in_
. However, I haven't been able to find any documentation on how to do this specifically.
The subclassing guidelines for compilation extension don't include anything about operators, suggesting that is not the place to start. The documentation on redefining and creating new operators is focused on changing or creating new operator behavior, but the behavior of the operator is not the problem, just the compiler.
Here is a very non-working example of what I'm trying to accomplish:
from sqlalchemy.types import TypeEngine
class in_list_(TypeEngine.Comparator):
pass
@compiles(in_list_)
def in_list_impl(element, compiler, **kwargs):
return "IN ('Now', 'I', 'can', 'inline', 'the', 'list')"
And then in an expression:
select([mytable.c.x, mytable.c.y]).where(mytable.c.x.in_list_(long_list))
Using IN
for very large lists is indeed pathological, and you might be better served using a temporary table and IN
against a subquery or a join. But the question was "how to override compiler output for specific operator". In case of binary operators such as IN
and NOT IN
what you need to override is how SQLAlchemy handles compiling BinaryExpression
s:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.elements import BinaryExpression
from sqlalchemy.sql.operators import in_op, notin_op
def visit_in_op_binary(compiler, binary, operator, **kw):
return "%s IN %s" % (
compiler.process(binary.left, **kw),
compiler.process(binary.right, **{**kw, "literal_binds": True}))
def visit_notin_op_binary(compiler, binary, operator, **kw):
return "%s NOT IN %s" % (
compiler.process(binary.left, **kw),
compiler.process(binary.right, **{**kw, "literal_binds": True}))
@compiles(BinaryExpression)
def compile_binary(binary, compiler, override_operator=None, **kw):
operator = override_operator or binary.operator
if operator is in_op:
return visit_in_op_binary(
compiler, binary, operator, override_operator=override_operator,
**kw)
if operator is notin_op:
return visit_notin_op_binary(
compiler, binary, operator, override_operator=override_operator,
**kw)
return compiler.visit_binary(binary, override_operator=override_operator, **kw)
Note that simply producing the binary expression containing the grouping and clause list of bind params takes a surprisingly lot of time for very large lists, not to mention compiling all that even if using literal binds, so you might not observe significant performance gains. On the other hand many implementations have limits on how many placeholders / parameters you can use in a statement, and so inlining the binds allows such queries to run at all.
If on the other hand your list does fit within the limits set by your implementation (Postgresql seems to be only limited by available RAM), you may not need any compiler workarounds with a recent enough SQLAlchemy; use expanding bind parameters instead:
In [15]: %%time
...: session.query(Foo).\
...: filter(Foo.data.in_(range(250000))).\
...: all()
...:
CPU times: user 5.09 s, sys: 91.9 ms, total: 5.18 s
Wall time: 5.18 s
Out[15]: []
In [16]: %%time
...: session.query(Foo).\
...: filter(Foo.data.in_(bindparam('xs', range(250000), expanding=True))).\
...: all()
...:
CPU times: user 310 ms, sys: 8.05 ms, total: 318 ms
Wall time: 317 ms
Out[16]: []
And as mentioned in comments, in version 1.4 an expanding bindparam
will support literal execution out of the box:
In [4]: session.query(Foo).\
...: filter(Foo.data.in_(
...: bindparam('xs', range(10), expanding=True, literal_execute=True))).\
...: all()
2019-09-07 20:35:04,560 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-09-07 20:35:04,561 INFO sqlalchemy.engine.base.Engine SELECT foo.id AS foo_id, foo.data AS foo_data
FROM foo
WHERE foo.data IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
2019-09-07 20:35:04,561 INFO sqlalchemy.engine.base.Engine ()
Out[4]: []
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With