Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write a Django query with a subquery as part of the WHERE clause?

I'm using Django and Python 3.7. I'm having trouble figuring out how to write a Django query where there's a subquery as part of a where clause. Here's the models ...

class Article(models.Model):
    objects = ArticleManager()
    title = models.TextField(default='', null=False)
    created_on = models.DateTimeField(auto_now_add=True)


class ArticleStat(models.Model):
    objects = ArticleStatManager()
    article = models.ForeignKey(Article, on_delete=models.CASCADE, related_name='articlestats')
    elapsed_time_in_seconds = models.IntegerField(default=0, null=False)
    votes = models.FloatField(default=0, null=False)


class StatByHour(models.Model):
    index = models.FloatField(default=0)
    # this tracks the hour when the article came out
    hour_of_day = IntegerField(
        null=False,
        validators=[
            MaxValueValidator(23),
            MinValueValidator(0)
        ]
    )

In PostGres, the query would look similar to

SELECT *
FROM article a,
     articlestat ast
WHERE a.id = ast.article_id
  AND ast.votes > 100 * (
    SELECT "index" 
    FROM statbyhour 
    WHERE hour_of_day = extract(hour from (a.created_on + 1000 * interval '1 second')))

Notice the subquery as part of the WHERE clause

ast.votes > 100 * (select index from statbyhour where hour_of_day = extract(hour from (a.created_on + 1000 * interval '1 second'))) 

So I thought I could do something like this ...

hour_filter = Func(
    Func(
        (F("article__created_on") + avg_fp_time_in_seconds * "interval '1 second'"),
        function='HOUR FROM'),
    function='EXTRACT')
...
votes_criterion2 = Q(votes__gte=F("article__website__stats__total_score") / F(
    "article__website__stats__num_articles") * settings.TRENDING_PCT_FLOOR *
                                StatByHour.objects.get(hour_of_day=hour_filter) * day_of_week_index)
qset = ArticleStat.objects.filter(votes_criterion1 & votes_criterion2,
                                  comments__lte=25)

but this results in a "Cannot resolve keyword 'article' into field. Choices are: hour_of_day, id, index, num_articles, total_score" error. I think this is because Django is evaulating my "StatByHour.objects" query before the larger query within it is run, but I don't know how to rewrite things to get the subquery to run at the same time.

Edit: K, moved my subquery into an actual "Subquery" function and referenced the filter I created using OuterRef ...

hour_filter = Func(
    Func(
        (F("article__created_on") + avg_fp_time_in_seconds * "interval '1 second'"),
        function='HOUR FROM'),
    function='EXTRACT')
query = StatByHour.objects.get(hour_of_day=OuterRef(hour_filter))


...
votes_criterion2 = Q(votes__gte=F("article__website__stats__total_score") / F(
    "article__website__stats__num_articles") * settings.TRENDING_PCT_FLOOR *
                                Subquery(query) * 
                 day_of_week_index)
qset = ArticleStat.objects.filter(votes_criterion1 & votes_criterion2,
                                  comments__lte=25)

and this results in an

This queryset contains a reference to an outer query and may only be used in a subquery.

which is odd because I am using it in a subquery.

Edit #2: Even after changing the query per the answer given ...

hour_filter = Func(
    Func(
        (F("article__created_on") + avg_fp_time_in_seconds * "interval '1 second'"),
        function='HOUR FROM'),
    function='EXTRACT')
query = StatByHour.objects.filter(hour_of_day=OuterRef(hour_filter))[:1]

...
votes_criterion2 = Q(votes__gte=F("article__website__stats__total_score") / F(
    "article__website__stats__num_articles") * settings.TRENDING_PCT_FLOOR *
                                Subquery(query) *
                                day_of_week_index)
qset = ArticleStat.objects.filter(et_criterion1 & et_criterion2 & et_criterion3,
                                  votes_criterion1 & votes_criterion2,
                                  article__front_page_first_appeared_date__isnull=True,
                                  comments__lte=25)

I still get the error

'Func' object has no attribute 'split'
like image 653
Dave Avatar asked Apr 11 '19 17:04

Dave


People also ask

How do I use subquery in Django ORM?

You can add an explicit subquery to a QuerySet using the Subquery expression. The examples in this section are designed to show how to force Django to execute a subquery.

What is OuterRef Django?

Feb 9, 2018, 12:52:50 PM2/9/18. to Django users. According to the documentation on models. OuterRef: It acts like an F expression except that the check to see if it refers to a valid field isn't made until the outer queryset is resolved.

How do I join a query in Django?

Join QueriesJoin can be done with select_related method: Django defines this function as Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query.

What is subquery example?

A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved. Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.


2 Answers

Subqueries need to be queries that are not immediately evaluated so that their evaluation can be postponed until the outer query is run. get() does not fit the bill as it is executed immediately and returns an object instance rather than a Queryset.

However, substituting filter for get and then taking a [:1] slice should work:

StatByHour.objects.filter(hour_of_day=OuterRef('hour_filter')).values('hour_of_day')[:1]

Note how the field reference in OuterRef is a string literal rather than a variable.

Moreover, subqueries need to return a single column and a single row (as they are assigned to a single field), hence the values() and the slicing above.

Also, I haven't used a subquery in a Q object yet; I'm not sure it will work. You may have to save the subquery output in an annotation first and then use that for your filter calculations.

like image 146
Endre Both Avatar answered Oct 14 '22 17:10

Endre Both


Use filtering by a subquery that itself is filtered with hour_of_day=ExtractHour(OuterRef('article__created_on') + timedelta(seconds=avg_fp_time_in_seconds)). Real code will require one extra ExpressionWrapper and only works on Django >= 2.1.0:

import datetime

from django.db import models
from django.db.models import F, OuterRef, Subquery, Value
from django.db.models.functions import ExtractHour, Coalesce
from django.db.models.expressions import ExpressionWrapper


relevant_hour_stats = (
    StatByHour.objects
    .filter(
        hour_of_day=ExtractHour(ExpressionWrapper(
            OuterRef('article__created_on')  # NOTE: `OuterRef()+Expression` works only on Django >= 2.1.0
            +
            datetime.timedelta(seconds=avg_fp_time_in_seconds),
            output_field=models.DateTimeField()
        )),
    )
    .annotate(
        votes_threshold=Coalesce(
            100.0 * F('index'),
            0.0,
            output_field=models.FloatField(),
        ),
    )
    .order_by('-votes_threshold')
    # NOTE: your StatByHour model does not have unique=True on hour_of_day
    # field, so there may be several stat for same hour.
    # And from your SQL example it's unclear how should they be handled. So I
    # assume that "greatest" threshold is needed.
)

article_stats = (
    ArticleStat.objects
    .all()
    .filter(
        votes__gt=Coalesce(
            Subquery(relevant_hour_stats.values('votes_threshold')[:1]),
            Value(0.0),
            output_field=models.FloatField(),
        ),
    )
)

P.S. It would be much easier if you set up some "demo project" on github so that anyone can clone it and check their ideas locally.

P.P.S. This code is tested to be working, but on different models/fields:

In [15]: relevant_something = (ModelOne.objects.filter(index=ExtractHour(ExpressionWrapper(OuterRef('due_date') + datetime.timedelta(seconds=1000), output_field=models.DateTimeField()))).annotate(votes_threshold=100*F('indent')).order_by('-votes_threshold'))

In [16]: ts = ModelTwo.objects.all().filter(votes__gt=Subquery(relevant_notes.values('votes_threshold')[:1], output_field=models.IntegerField()))

In [17]: print(ts.query)
SELECT 
    ...
FROM 
    "some_app_model_two" 
WHERE 
    "some_app_model_two"."votes" > (
        SELECT 
            (100 * U0."indent") AS "votes_threshold" 
        FROM 
            "some_app_model_one" U0 
        WHERE 
            U0."index" = (
                EXTRACT(
                    'hour' 
                    FROM ("some_app_model_two"."due_date" + 0:16:40) 
                    AT TIME ZONE 'America/Los_Angeles'
                )
            ) 
        ORDER BY "votes_threshold" DESC 
        LIMIT 1
    )
ORDER BY 
    "some_app_model_two"."due_date" ASC, 
    "some_app_model_two"."priority" ASC, 
    "some_app_model_two"."updated_at" DESC

So if you are getting any errors with it, then please show ACTUAL code that you are running

like image 31
imposeren Avatar answered Oct 14 '22 15:10

imposeren