Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

django annotate with queryset

I have Users who take Surveys periodically. The system has multiple surveys which it issues at set intervals from the submitted date of the last issued survey of that particular type.

class Survey(Model):
    name = CharField()
    description = TextField()
    interval = DurationField()  
    users = ManyToManyField(User, related_name='registered_surveys')
    ...

class SurveyRun(Model):
    ''' A users answers for 1 taken survey '''
    user = ForeignKey(User, related_name='runs')
    survey = ForeignKey(Survey, related_name='runs')
    created = models.DateTimeField(auto_now_add=True)
    submitted = models.DateTimeField(null=True, blank=True)
    # answers = ReverseForeignKey...

So with the models above a user should be alerted to take survey A next on this date:

A.interval + SurveyRun.objects.filter(
    user=user, 
    survey=A
).latest('submitted').submitted

I want to run a daily periodic task which queries all users and creates new runs for all users who have a survey due according to this criteria:

For each survey the user is registered:

  • if no runs exist for that user-survey combo then create the first run for that user-survey combination and alert the user
  • if there are runs for that survey and none are open (an open run has been created but not submitted so submitted=None) and the latest one's submitted date plus the survey's interval is <= today, create a new run for that user-survey combo and alert the user

Ideally I could create a manager method which would annotate with a surveys_due field like:

users_with_surveys_due = User.objects.with_surveys_due().filter(surveys_due__isnull=False)

Where the annotated field would be a queryset of Survey objects for which the user needs to submit a new round of answers. And I could issue alerts like this:

for user in users_with_surveys_due.all():
    for survey in user.surveys_due:
        new_run = SurveyRun.objects.create(
            user=user,
            survey=survey
        )
        alert_user(user, run)

However I would settle for a boolean flag annotation on the User object indicating one of the registered_surveys needs to create a new run.

How would I go about implementing something like this with_surveys_due() manager method so Postgres does all the heavy lifting? Is it possible to annotate with a collection objects, like a reverse FK?

UPDATE:

For clarity here is my current task in python:

def make_new_runs_and_alert_users():
    runs = []
    Srun = apps.get_model('surveys', 'SurveyRun')
    for user in get_user_model().objects.prefetch_related('registered_surveys', 'runs').all():
        for srvy in user.registered_surveys.all():
            runs_for_srvy = user.runs.filter(survey=srvy)
            # no runs exist for this registered survey, create first run
            if not runs_for_srvy.exists():
                runs.append(Srun(user=user, survey=srvy))
                ...

            # check this survey has no open runs
            elif not runs_for_srvy.filter(submitted=None).exists():
                latest = runs_for_srvy.latest('submitted')
                if (latest.submitted + qnr.interval) <= timezone.now():
                    runs.append(Srun(user=user, survey=srvy))
    Srun.objects.bulk_create(runs)

UPDATE #2:

In attempting to use Dirk's solution I have this simple example:

In [1]: test_user.runs.values_list('survey__name', 'submitted')                                                                                                                                     
Out[1]: <SurveyRunQuerySet [('Test', None)]>
In [2]: test_user.registered_surveys.values_list('name', flat=True)                                                                                                                                 
Out[2]: <SurveyQuerySet ['Test']>

The user has one open run (submitted=None) for the Test survey and is registered to one survey (Test). He/She should not be flagged for a new run seeing as there is an un-submitted run outstanding for the only survey he/she is registered for. So I create a function encapsulating the Dirk's solution called get_users_with_runs_due:

In [10]: get_users_with_runs_due()                                                                                                                                                                  
Out[10]: <UserQuerySet [<User: [email protected]>]> . # <-- should be an empty queryset

In [107]: for user in _: 
              print(user.email, i.has_survey_due)  
[email protected] True  # <-- should be false

UPDATE #3:

In my previous update I had made some changes to the logic to properly match what I wanted but neglected to mention or show the changes. Here is the query function below with comments by the changes:

def get_users_with_runs_due():
    today = timezone.now()

    survey_runs = SurveyRun.objects.filter(
        survey=OuterRef('pk'),
        user=OuterRef(OuterRef('pk'))
    ).order_by('-submitted')

    pending_survey_runs = survey_runs.filter(submitted__isnull=True)

    surveys = Survey.objects.filter(
        users=OuterRef('pk')
    ).annotate(
        latest_submission_date=Subquery(
            survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
        )
    ).annotate(
        has_survey_runs=Exists(survey_runs)
    ).annotate(
        has_pending_runs=Exists(pending_survey_runs)
    ).filter(
        Q(has_survey_runs=False) | # either has no runs for this survey or
        ( # has no pending runs and submission date meets criteria
            Q(has_pending_runs=False, latest_submission_date__lte=today - F('interval'))
        )
    )

    return User.objects.annotate(has_survey_due=Exists(surveys)).filter(has_survey_due=True)

UPDATE #4:

I tried to isolate the issue by creating a function which would make most of the annotations on the Surveys by user in an attempt to check the annotation on that level prior to querying the User model with it.

def annotate_surveys_for_user(user):
    today = timezone.now()

    survey_runs = SurveyRun.objects.filter(
        survey=OuterRef('pk'),
        user=user
    ).order_by('-submitted')

    pending_survey_runs = survey_runs.filter(submitted=None)

    return Survey.objects.filter(
            users=user
        ).annotate(
            latest_submission_date=Subquery(
                survey_runs.filter(submitted__isnull=False).values('submitted')[:1]
            )
        ).annotate(
            has_survey_runs=Exists(survey_runs)
        ).annotate(
            has_pending_runs=Exists(pending_survey_runs)
        )

This worked as expected. Where the annotations were accurate and filtering with:

result.filter(
    Q(has_survey_runs=False) |
        (
           Q(has_pending_runs=False) &
           Q(latest_submission_date__lte=today - F('interval'))
        )
    )

produced the desired results: An empty queryset where the user should not have any runs due and vice-versa. Why is this not working when making it the subquery and querying from the User model?

like image 657
Verbal_Kint Avatar asked Aug 26 '19 15:08

Verbal_Kint


People also ask

What is annotate in Django QuerySet?

annotate()Annotates each object in the QuerySet with the provided list of query expressions.

How does annotate help build more effective QuerySet?

Appending the annotate() clause onto a QuerySet lets you add an attribute to each item in the QuerySet, like if you wanted to count the amount of articles in each category. However, sometimes you only want to count objects that match a certain condition, for example only counting articles that are published.

What is difference between annotate and aggregate in Django?

Unlike aggregate() , annotate() is not a terminal clause. The output of the annotate() clause is a QuerySet ; this QuerySet can be modified using any other QuerySet operation, including filter() , order_by() , or even additional calls to annotate() .

How do I get QuerySet in Django?

You get a QuerySet by using your model's Manager . Each model has at least one Manager , and it's called objects by default. Access it directly via the model class, like so: >>> Blog.objects <django.db.models.manager.Manager object at ...> >>> b = Blog(name='Foo', tagline='Bar') >>> b.objects Traceback: ...


1 Answers

To annotate users with whether or not they have a survey due, I'd suggest to use a Subquery expression:

from django.db.models import Q, F, OuterRef, Subquery, Exists
from django.utils import timezone

today = timezone.now()

survey_runs = SurveyRun.objects.filter(survey=OuterRef('pk'), user=OuterRef(OuterRef('pk'))).order_by('-submitted')

pending_survey_runs = survey_runs.filter(submitted__isnull=True)

surveys = Survey.objects.filter(users=OuterRef('pk'))
   .annotate(latest_submission_date=Subquery(survey_runs.filter(submitted__isnull=False).values('submitted')[:1]))
   .annotate(has_survey_runs=Exists(survey_runs))
   .annotate(has_pending_runs=Exists(pending_survey_runs))
   .filter(Q(has_survey_runs=False) | Q(latest_submission_date__lte=today - F('interval')) & Q(has_pending_runs=False))

User.objects.annotate(has_survey_due=Exists(surveys))
    .filter(has_survey_due=True)

I'm still trying to figure out how to do the other one. You cannot annotate a queryset with another queryset, values must be field equivalents. Also you cannot use a Subquery as queryset parameter to Prefetch, unfortunately. But since you're using PostgreSQL you could use ArrayField to list the ids of the surveys in a wrapped value, but I haven't found a way to do that, as you can't use aggregate inside a Subquery.

like image 91
dirkgroten Avatar answered Sep 26 '22 17:09

dirkgroten