Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django ORM: Equivalent of SQL `NOT IN`? `exclude` and `Q` objects do not work

The Problem

I'm trying to use the Django ORM to do the equivalent of a SQL NOT IN clause, providing a list of IDs in a subselect to bring back a set of records from the logging table. I can't figure out if this is possible.

The Model

class JobLog(models.Model):
    job_number = models.BigIntegerField(blank=True, null=True)
    name = models.TextField(blank=True, null=True)
    username = models.TextField(blank=True, null=True)
    event = models.TextField(blank=True, null=True)
    time = models.DateTimeField(blank=True, null=True)

What I've Tried

My first attempt was to use exclude, but this does NOT to negate the entire Subquery, rather than the desired NOT IN:

query = (
    JobLog.objects.values(
        "username", "job_number", "name", "time",
    )
    .filter(time__gte=start, time__lte=end, event="delivered")
    .exclude(
        job_number__in=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )
    )
)

Unfortunately, this yields this SQL:

SELECT "view_job_log"."username", "view_job_log"."group", "view_job_log"."job_number", "view_job_log"."name", "view_job_log"."time"
FROM "view_job_log"
WHERE (
    "view_job_log"."event" = 'delivered'
    AND "view_job_log"."time" >= '2020-03-12T11:22:28.300590+00:00'::timestamptz
    AND "view_job_log"."time" <= '2020-03-13T11:22:28.300600+00:00'::timestamptz
    AND NOT (
        "view_job_log"."job_number" IN (
            SELECT U0."job_number"
            FROM "view_job_log" U0
            WHERE (
                U0."event" = 'finished' AND U0."time" >= '2020-03-12T11:22:28.300590+00:00'::timestamptz
                AND U0."time" <= '2020-03-13T11:22:28.300600+00:00'::timestamptz
            )
        )
        AND "view_job_log"."job_number" IS NOT NULL
    )
)

What I need is for the third AND clause to be AND "view_job_log"."job_number" NOT IN instead of the AND NOT (.

I've also tried doing the sub-select as it's own query first, with an exclude, as suggested here:

Django equivalent of SQL not in

However, this yields the same problematic result. Then I tried a Q object, which yields a similar query:

query = (
    JobLog.objects.values(
        "username", "subscriber_code", "job_number", "name", "time",
    )
    .filter(
        ~models.Q(job_number__in=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )),
        time__gte=start,
        time__lte=end,
        event="delivered",
    )
)

This attempt with the Q object yields the following SQL, again, without the NOT IN:

SELECT "view_job_log"."username", "view_job_log"."group", "view_job_log"."job_number", "view_job_log"."name", "view_job_log"."time"

FROM "view_job_log" WHERE (
    NOT (
        "view_job_log"."job_number" IN (
            SELECT U0."job_number"
            FROM "view_job_log" U0
            WHERE (
                U0."event" = 'finished'
                AND U0."time" >= '2020-03-12T11:33:28.098653+00:00'::timestamptz
                AND U0."time" <= '2020-03-13T11:33:28.098678+00:00'::timestamptz
            )
        )
        AND "view_job_log"."job_number" IS NOT NULL
    )
    AND "view_job_log"."event" = 'delivered'
    AND "view_job_log"."time" >= '2020-03-12T11:33:28.098653+00:00'::timestamptz
    AND "view_job_log"."time" <= '2020-03-13T11:33:28.098678+00:00'::timestamptz
)

Is there any way to get Django's ORM to do something equivalent to AND job_number NOT IN (12345, 12346, 12347)? Or am I going to have to drop to raw SQL to accomplish this?

Thanks in advance for reading this entire wall-of-text question. Explicit is better than implicit. :)

like image 817
FlipperPA Avatar asked Mar 13 '20 14:03

FlipperPA


People also ask

What is Orm in Django?

ORM stands for Object Relation Mapper. Django ORM is a powerful and elegant way to interact with the database. The Django ORM is an abstraction layer that allows us to play with the database. In the end, Django ORM will convert all operations into SQL statements. In this piece, We will learn ORM of some common SQL queries.

Is there a SQL equivalent of id not in in Django?

The solution provided on that post, and another provided on Django's forum, seem like a reasonable addition to Django. Lacking support for the equivalent of SQL's id NOT IN (1, 2, 3) is a hole in the ORM feature set, especially now that we have Subquery.

How to do a not query in Django?

4. How to do a NOT query in Django queryset? 4. How to do a NOT query in Django queryset? ¶ If you are using django.contrib.auth, you will have a table called auth_user. It will have fields as username, first_name, last_name and more. Say you want to fetch all users with id NOT < 5. You need a NOT operation. Django provides two options. 4.1.

How to rename columns in Django ORM?

In Django ORM values () method is used to select a few column values of the table. ‘__in’ is used to filter on multiple values. Excludes objects from the queryset which match with the lookup parameters. The extra () method is used to rename columns in the ORM. In this ORM, I’ve renamed first_name to FirstName and last_name to LastName.


2 Answers

I think the easiest way to do this would be to define a custom lookup, similar to this one or the in lookup

from django.db.models.lookups import In as LookupIn

class NotIn(LookupIn):
    lookup_name = "notin"

    def get_rhs_op(self, connection, rhs):
        return "NOT IN %s" % rhs

Field.register_lookup(NotIn)

or

class NotIn(models.Lookup):
    lookup_name = "notin"

    def as_sql(self, compiler, connection):
        lhs, params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params.extend(rhs_params)

        return "%s NOT IN %s" % (lhs, rhs), params

then use it in your query:

query = (
    JobLog.objects.values(
        "username", "job_number", "name", "time",
    )
    .filter(time__gte=start, time__lte=end, event="delivered")
    .filter(
        job_number__notin=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )
    )
)

this generates the SQL:

SELECT
    "people_joblog"."username",
    "people_joblog"."job_number",
    "people_joblog"."name",
    "people_joblog"."time"
FROM
    "people_joblog"
WHERE ("people_joblog"."event" = delivered
    AND "people_joblog"."time" >= 2020 - 03 - 13 15:24:34.691222 + 00:00
    AND "people_joblog"."time" <= 2020 - 03 - 13 15:24:41.678069 + 00:00
    AND "people_joblog"."job_number" NOT IN (
        SELECT
            U0. "job_number"
        FROM
            "people_joblog" U0
        WHERE (U0. "event" = finished
            AND U0. "time" >= 2020 - 03 - 13 15:24:34.691222 + 00:00
            AND U0. "time" <= 2020 - 03 - 13 15:24:41.678069 + 00:00)))
like image 127
Lotram Avatar answered Oct 30 '22 20:10

Lotram


You can likely achieve the same results by using an Exists and special casing NULLs.

.filter(
   ~Exists(
       JobLog.objects.filter(
           Q(jobnumber=None) | Q(jobnumber=OuterRef('jobnumber')),
           time__gte=start,
           time__lte=end,
           event='finished',
       )
   )
)
like image 42
Simon Charette Avatar answered Oct 30 '22 20:10

Simon Charette