Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django - Complex query

Tags:

python

django

Assuming I have two models:

class Profile(models.Model):
    #some fields here

class Ratings(models.Model):
    profile = models.ForeignKey(profile)
    category = models.IntegerField()
    points = models.IntegerField()

Assuming following examle of MySQL table "ratings":

profile    |    category    |    points
   1                1               10
   1                1               4
   1                2               10
   1                3               0
   1                4               10
   1                4               10
   1                4               10
   1                5               0

I have following values in my POST data and also other fields values:

category_1_avg_val = 7
category_2_avg_val = 5
category_3_avg_val = 5
category_4_avg_val = 7
category_5_avg_val = 9

I want to filter profiles that have the average ratings calculated for categories higher or equal to required values.

Some filters are applied initially as:

q1 = [('associated_with', search_for),
      ('profile_type__slug__exact', profile_type),
      ('gender__in', gender),
      ('rank__in', rank),
      ('styles__style__in', styles),
      ('age__gte', age_from),
      ('age__lte', age_to)]
q1_list = [Q(x) for x in q1 if x[1]]

q2 = [('user__first_name__icontains', search_term),
      ('user__last_name__icontains', search_term),
      ('profile_type__name__icontains', search_term),
      ('styles__style__icontains', search_term),
      ('rank__icontains', search_term)]
q2_list = [Q(x) for x in q2 if x[1]]

if q1_list:
    objects = Profile.objects.filter(
        reduce(operator.and_, q1_list))

if q2_list:
    if objects:
        objects = objects.filter(
            reduce(operator.or_, q2_list))
    else:
        objects = Profile.objects.filter(
            reduce(operator.or_, q2_list))

if order_by_ranking_level == 'desc':
    objects = objects.order_by('-ranking_level').distinct()
else:
    objects = objects.order_by('ranking_level').distinct()

Now i want to filter profiles whose (average of points) (group by category) >= (avg values of category coming in post)

I tried to do this one by one as

objects = objects.filter(
    ratings__category=1) \
    .annotate(avg_points=Avg('ratings__points'))\
    .filter(avg_points__gte=category_1_avg_val)


objects = objects.filter(
    ratings__category=2) \
    .annotate(avg_points=Avg('ratings__points'))\
    .filter(avg_points__gte=category_2_avg_val)

But this is wrong I think. Please help me out. If return is a queryset that would be great.

Edited Using the answer posted by hynekcer I came up with slightly different solution as I have already queryset of profiles which needs to be filtered more based on rating.

def check_ratings_avg(pr, rtd):
    ok = True
    qr = Ratings.objects.filter(profile__id=pr.id) \
        .values('category')\
        .annotate(points_avg=Avg('points'))
    qr = {i['category']:i['points_avg'] for i in qr}

    for cat in rtd:
        val = rtd[cat]
        if qr[cat] >= val:
            pass
        else:
            ok = False
            break
    return ok


rtd = {1: category_1_avg_val, 2: category_2_avg_val, 3: category_3_avg_val,
       4: category_4_avg_val, 5: category_5_avg_val}
objects = [i for i in objects if check_ratings_avg(i, rtd)]
like image 240
Aamir Rind Avatar asked Nov 05 '12 18:11

Aamir Rind


1 Answers

Your complex query require a subquery in the principle. Possible solutions are:

  • A subquery written by 'extra' queryset method or raw SQL query. It is not DRY and it was unsupported by some db backends, e.g. by some versions of MySQL, however subqueries are by some limited way used since Django 1.1.
  • Saving intermediate results into a temporary table in the database. It is not nice in Django.
  • Emulation of the outer query by loop in Python. The best universal solution. A loop in Python over database data aggregated by the first query can aggregate and filter the data fast enough.

A) Subquery emulated by Python

from django.db.models import Q, Avg
from itertools import groupby
from myapp.models import Profile, Ratings

def iterator_filtered_by_average(dictionary):
    qr = Ratings.objects.values('profile', 'category', 'points').order_by(
            'profile', 'category').annotate(points_avg=Avg('points'))
    f = Q()
    for k, v in dictionary.iteritems():
        f |= Q(category=k, points_avg__gte=v)
    for profile, grp in groupby(qr.filter(f).values('profile')):
        if len(list(grp)) == len(dictionary):
            yield profile

#example
FILTER_DATA = {1:category_1_avg_val, 2:category_2_avg_val, 3:category_3_avg_val,
               4:category_4_avg_val, 5:category_5_avg_val}
for row in iterator_filtered_by_average(FILTER_DATA):
    print row

This is a simple solution for the original question without later additional requirements.

B) Solution with subqueries:
It is necessary for the more detailed version of question because if the initial filters are based on some field of type ManyToManyField and also because it contains a distinct clause:

# objects:  QuerySet that you get from your initial filters. Not yet executed.
if rtd:
    # Method `as_nested_sql` removes the `order_by` clase, unlike `as_sql`
    subquery3 = objects.values('id').query \
            .get_compiler(connection=connection).as_nested_sql()
    subquery2 = ("""SELECT profile_id, category, avg(points) AS points_avg
          FROM myapp_ratings
          WHERE profile_id in
          ( %s
          ) GROUP BY profile_id, category
            """ % subquery3[0], subquery3[1]
    )
    where_sql = ' OR '.join(
            'category = %d AND points_avg >= %%s' % cat for cat in rtd.keys()
    )
    subquery = (
        """SELECT profile_id
        FROM
        ( %s
        ) subquery2
        WHERE %s
        GROUP BY profile_id
        HAVING count(*) = %s
        """ % (subquery2[0], where_sql, len(rtd)),
        subquery2[1] + tuple(rtd.values())
    )
    assert order_by_ranking_level in ('asc', 'desc')
    mainquery = ("""SELECT myapp_profile.* FROM myapp_profile
      INNER JOIN
      ( %s
      ) subquery ON subquery.profile_id=myapp_profile.id
      ORDER BY ranking_level %s"""
        % (subquery[0], order_by_ranking_level), subquery[1]
    )
    objects = Profile.objects.raw(mainquery[0], params=mainquery[1])
return objects

Replace please all strings myapp by name_of_your_application.

Example of SQL generated by this code

SELECT myapp_profile.* FROM myapp_profile
  INNER JOIN
  ( SELECT profile_id
    FROM
    ( SELECT profile_id, category, avg(points) AS points_avg
      FROM myapp_ratings
      WHERE profile_id IN
      ( SELECT U0.`id` FROM `myapp_profile` U0 WHERE U0.`ranking_level` >= 4
      ) GROUP BY profile_id, category
    ) subquery2
    WHERE category = 1 AND points_avg >= 7 OR category = 2 AND points_avg >= 5
       OR category = 3 AND points_avg >= 5 OR category = 4 AND points_avg >= 7
       OR category = 5 AND points_avg >= 9
    GROUP BY profile_id
    HAVING count(*) = 5
  ) subquery ON subquery.profile_id=myapp_profile.id
  ORDER BY ranking_level asc

(This SQL is for better readability parsed manually with strings %s replaced by parameters, however the database engine receive parameters unparsed for security reasons.)


Your problem is due to little support of subqueries generated by Django. Only examples from documentation of more complicated queries create a subquery. (e.g. aggregate after annotate or count after annotate or aggregate after distinct, but no annotate after distinct or after annotate) Complicated nested aggregations are simplified to one query which is unexpected.

All other solutions that execute a new individual SQL query for every object filtered by the first query are discouraged for production although they can be very useful for testing results of any better solution.

like image 180
hynekcer Avatar answered Sep 20 '22 05:09

hynekcer