Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django query filter using large array of ids in Postgres DB

I want to pass a query in Django to my PostgreSQL database. When I filter my query using a large array of ids, the query is very slow and goes up to 70s.

After looking for an answer I saw this post which gives a solution to my problem, simply change the ARRAY [ids] in IN statement by VALUES (id1), (id2), ....

I tested the solution with a raw query in pgadmin, the query goes from 70s to 300ms...

How can I do the same command (i.e. not using an array of ids but a query with VALUES) in Django?

like image 409
poolet Avatar asked Oct 13 '17 16:10

poolet


2 Answers

I found a solution building on @erwin-brandstetter answer using a custom lookup

from django.db.models import Lookup
from django.db.models.fields import Field

@Field.register_lookup
class EfficientInLookup(Lookup):

    lookup_name = "ineff"

    def as_sql(self, compiler, connection):
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params
        return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params

This allows to filter like this:

MyModel.objects.filter(id__ineff=<list-of-values>)
like image 59
ilse2005 Avatar answered Oct 20 '22 22:10

ilse2005


The trick is to transform the array to a set somehow.

Instead of (this form is only good for a short array):

SELECT *
FROM   tbl t
WHERE  t.tbl_id = ANY($1);
-- WHERE  t.tbl_id IN($1);  -- equivalent

$1 being the array parameter.

You can still pass an array like you had it, but unnest and join. Like:

SELECT *
FROM   tbl t
JOIN   unnest($1) arr(id) ON arr.id = t.tbl_id;

Or you can keep your query, too, but replace the array with a subquery unnesting it:

SELECT * FROM tbl t
WHERE  t.tbl_id = ANY (SELECT unnest($1));

Or:

SELECT * FROM tbl t
WHERE  t.tbl_id IN    (SELECT unnest($1));

Same effect for performance as passing a set with a VALUES expression. But passing the array is typically much simpler.

Detailed explanation:

  • IN vs ANY operator in PostgreSQL
  • How to use ANY instead of IN in a WHERE clause with Rails?
  • Optimizing a Postgres query with a large IN
like image 34
Erwin Brandstetter Avatar answered Oct 20 '22 20:10

Erwin Brandstetter