Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specific complex SQL query and Django ORM?

Tags:

sql

orm

django

I have a set of tables that contain content that is created and voted on by users.

Table content_a

id         /* the id of the content */
user_id    /* the user that contributed the content */
content    /* the content */

Table content_b

id
user_id
content

Table content_c

id
user_id
content

Table voting

user_id         /* the user that made the vote */
content_id      /* the content the vote was made on */
content_type_id /* the content type the vote was made on */
vote            /* the value of the vote, either +1 or -1 */

I want to be able to select a set of users and order them by the sum of the votes on the content they have produced. For example,

SELECT * FROM users ORDER BY <sum of votes on all content associated with user>

Is there a specific way this can be achieved using Django's ORM, or do I have to use a raw SQL query? And what would the most efficient way be to achieve this in raw SQL?

like image 940
mburke13 Avatar asked May 20 '12 14:05

mburke13


2 Answers

Update

Assuming the models are

from django.contrib.contenttypes import generic
from django.contrib.contenttypes.models import ContentType


class ContentA(models.Model):
    user = models.ForeignKey(User)
    content = models.TextField()

class ContentB(models.Model):
    user = models.ForeignKey(User)
    content = models.TextField()

class ContentC(models.Model):
    user = models.ForeignKey(User)
    content = models.TextField()

class GenericVote(models.Model):
    content_type = models.ForeignKey(ContentType)
    object_id = models.PositiveIntegerField()
    content_object = generic.GenericForeignKey()
    user = models.ForeignKey(User)
    vote = models.IntegerField(default=1)

Option A. Using GenericVote

GenericVote.objects.extra(select={'uid':"""
CASE
WHEN content_type_id = {ct_a} THEN (SELECT user_id FROM {ContentA._meta.db_table} WHERE id = object_id)
WHEN content_type_id = {ct_b} THEN (SELECT user_id FROM {ContentB._meta.db_table} WHERE id = object_id)
WHEN content_type_id = {ct_c} THEN (SELECT user_id FROM {ContentC._meta.db_table} WHERE id = object_id)
END""".format(
ct_a=ContentType.objects.get_for_model(ContentA).pk,
ct_b=ContentType.objects.get_for_model(ContentB).pk,
ct_c=ContentType.objects.get_for_model(ContentC).pk,
ContentA=ContentA,
ContentB=ContentB,
ContentC=ContentC
)}).values('uid').annotate(vc=models.Sum('vote')).order_by('-vc')

The above ValuesQuerySet,(or use values_list()) gives you a sequence of IDs of User()s in the order of descending votes count. You could then use it to fetch top users.

Option B. Using User.objects.raw

When I use User.objects.raw, I got almost same query w/ the answer given by forsvarir :

User.objects.raw("""
SELECT "{user_tbl}".*, SUM("gv"."vc") as vote_count from {user_tbl},
    (SELECT id, user_id, {ct_a} AS ct FROM {ContentA._meta.db_table} UNION
     SELECT id, user_id, {ct_b} AS ct FROM {ContentB._meta.db_table} UNION
     SELECT id, user_id, {ct_c} as ct FROM {ContentC._meta.db_table}
    ) as c,
   (SELECT content_type_id, object_id, SUM("vote") as vc FROM {GenericVote._meta.db_table} GROUP BY content_type_id, object_id) as gv
WHERE {user_tbl}.id = c.user_id
    AND gv.content_type_id = c.ct
    AND gv.object_id = c.id
GROUP BY {user_tbl}.id
ORDER BY "vc" DESC""".format(
    user_tbl=User._meta.db_table, ContentA=ContentA, ContentB=ContentB,
    ContentC=ContentC, GenericVote=GenericVote, 
    ct_a=ContentType.objects.get_for_model(ContentA).pk,
    ct_b=ContentType.objects.get_for_model(ContentB).pk,
    ct_c=ContentType.objects.get_for_model(ContentC).pk
))

Option C. Other possible ways

  • De-normalize vote_count to User or profile model, for example, UserProfile, or other relative model, as suggested by Michael Dunn. This behaves much better if you access vote_count on-fly frequently.
  • Build a DB view which does the UNIONs for you, then map a model to it, this could make the construction of the query easier.
  • Sort in Python, usually it's best way to work for large-scale data, because of dozen of toolkits and extension ways.

You need some Django Models mapping those tables before use Django ORM to query. Assuming they are User and Voting models that matching users and voting tables, you could then

User.objects.annotate(v=models.Sum('voting__vote')).order_by('v')
like image 79
okm Avatar answered Oct 01 '22 16:10

okm


For a raw SQL solution, I've created a rough replication of your problem on ideone here

Data setup:

create table content_a(id int, user_id int, content varchar(20));
create table content_b(id int, user_id int, content varchar(20));
create table content_c(id int, user_id int, content varchar(20));
create table voting(user_id int, content_id int, content_type_id int, vote int);
create table users(id int, name varchar(20));
insert into content_a values(1,1,'aaaa');
insert into content_a values(2,1,'bbbb');
insert into content_a values(3,1,'cccc');
insert into content_b values(1,2,'dddd');
insert into content_b values(2,2,'eeee');
insert into content_b values(3,2,'ffff');
insert into content_c values(1,1,'gggg');
insert into content_c values(2,2,'hhhh');
insert into content_c values(3,3,'iiii');
insert into users values(1, 'first');
insert into users values(2, 'second');
insert into users values(3, 'third');
insert into users values(4, 'voteonly');

-- user 1 net votes (2)
insert into voting values (1, 1, 1, 1);
insert into voting values (2, 3, 1, -1);
insert into voting values (3, 1, 1, 1); 
insert into voting values (4, 2, 1, 1); 

-- user 2 net votes (3)
insert into voting values (1, 2, 2, 1);
insert into voting values (1, 1, 2, 1);
insert into voting values (2, 3, 2, -1);
insert into voting values (4, 2, 2, 1);
insert into voting values (4, 2, 3, 1);

-- user 3 net votes (-1)
insert into voting values (2, 3, 3, -1);

I've basically assumed that content_a has a type of 1, content_b has a type of 2 and content_c has a type of 3. Using raw SQL, there seems to be two obvious approaches. The first is to union all of the content together, then join it with the users and voting tables. I've tested this approach below.

select users.*, sum(voting.vote)
from users, 
    voting, (
        SELECT     id, 1 AS content_type_id, user_id
        FROM         content_a
        UNION
        SELECT     id, 2 AS content_type_id, user_id
        FROM         content_b
        UNION
        SELECT     id, 3 AS content_type_id, user_id
        FROM         content_c) contents
where contents.user_id = users.id
and voting.content_id = contents.id
and voting.content_type_id = contents.content_type_id
group by users.id
order by sum(voting.vote) desc;

The alternative would seem to be to outer join the content tables to the voting tables, without the union step. This may be more performant, but I haven't been able to test it because visual studio keeps rewriting my sql for me... I'd expect the SQL to look something like this (but I haven't tested it):

select users.*, sum(voting.vote)
from users, voting, content_a, content_b, content_c
where users.id = content_a.user_id (+)
and users.id = content_b.user_id (+)
and users.id = content_c.user_id (+)
and ((content_a.id = voting.content_id and voting.content_type_id = 1) OR
     (content_b.id = voting.content_id and voting.content_type_id = 2) OR
     (content_c.id = voting.content_id and voting.content_type_id = 3))
group by users.id
order by sum(voting.vote) desc;
like image 22
forsvarir Avatar answered Oct 01 '22 14:10

forsvarir