I'm using Django and Python 3.7. I'm having trouble figuring out how to write a Django query where there's a subquery as part of a where clause. Here's the models ...
class Article(models.Model):
objects = ArticleManager()
title = models.TextField(default='', null=False)
created_on = models.DateTimeField(auto_now_add=True)
class ArticleStat(models.Model):
objects = ArticleStatManager()
article = models.ForeignKey(Article, on_delete=models.CASCADE, related_name='articlestats')
elapsed_time_in_seconds = models.IntegerField(default=0, null=False)
votes = models.FloatField(default=0, null=False)
class StatByHour(models.Model):
index = models.FloatField(default=0)
# this tracks the hour when the article came out
hour_of_day = IntegerField(
null=False,
validators=[
MaxValueValidator(23),
MinValueValidator(0)
]
)
In PostGres, the query would look similar to
SELECT *
FROM article a,
articlestat ast
WHERE a.id = ast.article_id
AND ast.votes > 100 * (
SELECT "index"
FROM statbyhour
WHERE hour_of_day = extract(hour from (a.created_on + 1000 * interval '1 second')))
Notice the subquery as part of the WHERE clause
ast.votes > 100 * (select index from statbyhour where hour_of_day = extract(hour from (a.created_on + 1000 * interval '1 second')))
So I thought I could do something like this ...
hour_filter = Func(
Func(
(F("article__created_on") + avg_fp_time_in_seconds * "interval '1 second'"),
function='HOUR FROM'),
function='EXTRACT')
...
votes_criterion2 = Q(votes__gte=F("article__website__stats__total_score") / F(
"article__website__stats__num_articles") * settings.TRENDING_PCT_FLOOR *
StatByHour.objects.get(hour_of_day=hour_filter) * day_of_week_index)
qset = ArticleStat.objects.filter(votes_criterion1 & votes_criterion2,
comments__lte=25)
but this results in a "Cannot resolve keyword 'article' into field. Choices are: hour_of_day, id, index, num_articles, total_score" error. I think this is because Django is evaulating my "StatByHour.objects" query before the larger query within it is run, but I don't know how to rewrite things to get the subquery to run at the same time.
Edit: K, moved my subquery into an actual "Subquery" function and referenced the filter I created using OuterRef ...
hour_filter = Func(
Func(
(F("article__created_on") + avg_fp_time_in_seconds * "interval '1 second'"),
function='HOUR FROM'),
function='EXTRACT')
query = StatByHour.objects.get(hour_of_day=OuterRef(hour_filter))
...
votes_criterion2 = Q(votes__gte=F("article__website__stats__total_score") / F(
"article__website__stats__num_articles") * settings.TRENDING_PCT_FLOOR *
Subquery(query) *
day_of_week_index)
qset = ArticleStat.objects.filter(votes_criterion1 & votes_criterion2,
comments__lte=25)
and this results in an
This queryset contains a reference to an outer query and may only be used in a subquery.
which is odd because I am using it in a subquery.
Edit #2: Even after changing the query per the answer given ...
hour_filter = Func(
Func(
(F("article__created_on") + avg_fp_time_in_seconds * "interval '1 second'"),
function='HOUR FROM'),
function='EXTRACT')
query = StatByHour.objects.filter(hour_of_day=OuterRef(hour_filter))[:1]
...
votes_criterion2 = Q(votes__gte=F("article__website__stats__total_score") / F(
"article__website__stats__num_articles") * settings.TRENDING_PCT_FLOOR *
Subquery(query) *
day_of_week_index)
qset = ArticleStat.objects.filter(et_criterion1 & et_criterion2 & et_criterion3,
votes_criterion1 & votes_criterion2,
article__front_page_first_appeared_date__isnull=True,
comments__lte=25)
I still get the error
'Func' object has no attribute 'split'
You can add an explicit subquery to a QuerySet using the Subquery expression. The examples in this section are designed to show how to force Django to execute a subquery.
Feb 9, 2018, 12:52:50 PM2/9/18. to Django users. According to the documentation on models. OuterRef: It acts like an F expression except that the check to see if it refers to a valid field isn't made until the outer queryset is resolved.
Join QueriesJoin can be done with select_related method: Django defines this function as Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query.
A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved. Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
Subqueries need to be queries that are not immediately evaluated so that their evaluation can be postponed until the outer query is run. get()
does not fit the bill as it is executed immediately and returns an object instance rather than a Queryset
.
However, substituting filter
for get
and then taking a [:1]
slice should work:
StatByHour.objects.filter(hour_of_day=OuterRef('hour_filter')).values('hour_of_day')[:1]
Note how the field reference in OuterRef is a string literal rather than a variable.
Moreover, subqueries need to return a single column and a single row (as they are assigned to a single field), hence the values()
and the slicing above.
Also, I haven't used a subquery in a Q
object yet; I'm not sure it will work. You may have to save the subquery output in an annotation first and then use that for your filter calculations.
Use filtering by a subquery that itself is filtered with hour_of_day=ExtractHour(OuterRef('article__created_on') + timedelta(seconds=avg_fp_time_in_seconds))
. Real code will require one extra ExpressionWrapper
and only works on Django >= 2.1.0
:
import datetime
from django.db import models
from django.db.models import F, OuterRef, Subquery, Value
from django.db.models.functions import ExtractHour, Coalesce
from django.db.models.expressions import ExpressionWrapper
relevant_hour_stats = (
StatByHour.objects
.filter(
hour_of_day=ExtractHour(ExpressionWrapper(
OuterRef('article__created_on') # NOTE: `OuterRef()+Expression` works only on Django >= 2.1.0
+
datetime.timedelta(seconds=avg_fp_time_in_seconds),
output_field=models.DateTimeField()
)),
)
.annotate(
votes_threshold=Coalesce(
100.0 * F('index'),
0.0,
output_field=models.FloatField(),
),
)
.order_by('-votes_threshold')
# NOTE: your StatByHour model does not have unique=True on hour_of_day
# field, so there may be several stat for same hour.
# And from your SQL example it's unclear how should they be handled. So I
# assume that "greatest" threshold is needed.
)
article_stats = (
ArticleStat.objects
.all()
.filter(
votes__gt=Coalesce(
Subquery(relevant_hour_stats.values('votes_threshold')[:1]),
Value(0.0),
output_field=models.FloatField(),
),
)
)
P.S. It would be much easier if you set up some "demo project" on github so that anyone can clone it and check their ideas locally.
P.P.S. This code is tested to be working, but on different models/fields:
In [15]: relevant_something = (ModelOne.objects.filter(index=ExtractHour(ExpressionWrapper(OuterRef('due_date') + datetime.timedelta(seconds=1000), output_field=models.DateTimeField()))).annotate(votes_threshold=100*F('indent')).order_by('-votes_threshold'))
In [16]: ts = ModelTwo.objects.all().filter(votes__gt=Subquery(relevant_notes.values('votes_threshold')[:1], output_field=models.IntegerField()))
In [17]: print(ts.query)
SELECT
...
FROM
"some_app_model_two"
WHERE
"some_app_model_two"."votes" > (
SELECT
(100 * U0."indent") AS "votes_threshold"
FROM
"some_app_model_one" U0
WHERE
U0."index" = (
EXTRACT(
'hour'
FROM ("some_app_model_two"."due_date" + 0:16:40)
AT TIME ZONE 'America/Los_Angeles'
)
)
ORDER BY "votes_threshold" DESC
LIMIT 1
)
ORDER BY
"some_app_model_two"."due_date" ASC,
"some_app_model_two"."priority" ASC,
"some_app_model_two"."updated_at" DESC
So if you are getting any errors with it, then please show ACTUAL code that you are running
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With