I have a set of tables that contain content that is created and voted on by users.
Table content_a
id /* the id of the content */
user_id /* the user that contributed the content */
content /* the content */
Table content_b
id
user_id
content
Table content_c
id
user_id
content
Table voting
user_id /* the user that made the vote */
content_id /* the content the vote was made on */
content_type_id /* the content type the vote was made on */
vote /* the value of the vote, either +1 or -1 */
I want to be able to select a set of users and order them by the sum of the votes on the content they have produced. For example,
SELECT * FROM users ORDER BY <sum of votes on all content associated with user>
Is there a specific way this can be achieved using Django's ORM, or do I have to use a raw SQL query? And what would the most efficient way be to achieve this in raw SQL?
Update
Assuming the models are
from django.contrib.contenttypes import generic
from django.contrib.contenttypes.models import ContentType
class ContentA(models.Model):
user = models.ForeignKey(User)
content = models.TextField()
class ContentB(models.Model):
user = models.ForeignKey(User)
content = models.TextField()
class ContentC(models.Model):
user = models.ForeignKey(User)
content = models.TextField()
class GenericVote(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey()
user = models.ForeignKey(User)
vote = models.IntegerField(default=1)
Option A. Using GenericVote
GenericVote.objects.extra(select={'uid':"""
CASE
WHEN content_type_id = {ct_a} THEN (SELECT user_id FROM {ContentA._meta.db_table} WHERE id = object_id)
WHEN content_type_id = {ct_b} THEN (SELECT user_id FROM {ContentB._meta.db_table} WHERE id = object_id)
WHEN content_type_id = {ct_c} THEN (SELECT user_id FROM {ContentC._meta.db_table} WHERE id = object_id)
END""".format(
ct_a=ContentType.objects.get_for_model(ContentA).pk,
ct_b=ContentType.objects.get_for_model(ContentB).pk,
ct_c=ContentType.objects.get_for_model(ContentC).pk,
ContentA=ContentA,
ContentB=ContentB,
ContentC=ContentC
)}).values('uid').annotate(vc=models.Sum('vote')).order_by('-vc')
The above ValuesQuerySet
,(or use values_list()
) gives you a sequence of IDs of User()
s in the order of descending votes count. You could then use it to fetch top users.
Option B. Using User.objects.raw
When I use User.objects.raw
, I got almost same query w/ the answer given by forsvarir :
User.objects.raw("""
SELECT "{user_tbl}".*, SUM("gv"."vc") as vote_count from {user_tbl},
(SELECT id, user_id, {ct_a} AS ct FROM {ContentA._meta.db_table} UNION
SELECT id, user_id, {ct_b} AS ct FROM {ContentB._meta.db_table} UNION
SELECT id, user_id, {ct_c} as ct FROM {ContentC._meta.db_table}
) as c,
(SELECT content_type_id, object_id, SUM("vote") as vc FROM {GenericVote._meta.db_table} GROUP BY content_type_id, object_id) as gv
WHERE {user_tbl}.id = c.user_id
AND gv.content_type_id = c.ct
AND gv.object_id = c.id
GROUP BY {user_tbl}.id
ORDER BY "vc" DESC""".format(
user_tbl=User._meta.db_table, ContentA=ContentA, ContentB=ContentB,
ContentC=ContentC, GenericVote=GenericVote,
ct_a=ContentType.objects.get_for_model(ContentA).pk,
ct_b=ContentType.objects.get_for_model(ContentB).pk,
ct_c=ContentType.objects.get_for_model(ContentC).pk
))
Option C. Other possible ways
vote_count
to User
or profile model, for example, UserProfile
, or other relative model, as suggested by Michael Dunn. This behaves much better if you access vote_count
on-fly frequently.UNION
s for you, then map a model to it, this could make the construction of the query easier.You need some Django Models mapping those tables before use Django ORM to query. Assuming they are User
and Voting
models that matching users
and voting
tables, you could then
User.objects.annotate(v=models.Sum('voting__vote')).order_by('v')
For a raw SQL solution, I've created a rough replication of your problem on ideone here
Data setup:
create table content_a(id int, user_id int, content varchar(20));
create table content_b(id int, user_id int, content varchar(20));
create table content_c(id int, user_id int, content varchar(20));
create table voting(user_id int, content_id int, content_type_id int, vote int);
create table users(id int, name varchar(20));
insert into content_a values(1,1,'aaaa');
insert into content_a values(2,1,'bbbb');
insert into content_a values(3,1,'cccc');
insert into content_b values(1,2,'dddd');
insert into content_b values(2,2,'eeee');
insert into content_b values(3,2,'ffff');
insert into content_c values(1,1,'gggg');
insert into content_c values(2,2,'hhhh');
insert into content_c values(3,3,'iiii');
insert into users values(1, 'first');
insert into users values(2, 'second');
insert into users values(3, 'third');
insert into users values(4, 'voteonly');
-- user 1 net votes (2)
insert into voting values (1, 1, 1, 1);
insert into voting values (2, 3, 1, -1);
insert into voting values (3, 1, 1, 1);
insert into voting values (4, 2, 1, 1);
-- user 2 net votes (3)
insert into voting values (1, 2, 2, 1);
insert into voting values (1, 1, 2, 1);
insert into voting values (2, 3, 2, -1);
insert into voting values (4, 2, 2, 1);
insert into voting values (4, 2, 3, 1);
-- user 3 net votes (-1)
insert into voting values (2, 3, 3, -1);
I've basically assumed that content_a has a type of 1, content_b has a type of 2 and content_c has a type of 3. Using raw SQL, there seems to be two obvious approaches. The first is to union all of the content together, then join it with the users and voting tables. I've tested this approach below.
select users.*, sum(voting.vote)
from users,
voting, (
SELECT id, 1 AS content_type_id, user_id
FROM content_a
UNION
SELECT id, 2 AS content_type_id, user_id
FROM content_b
UNION
SELECT id, 3 AS content_type_id, user_id
FROM content_c) contents
where contents.user_id = users.id
and voting.content_id = contents.id
and voting.content_type_id = contents.content_type_id
group by users.id
order by sum(voting.vote) desc;
The alternative would seem to be to outer join the content tables to the voting tables, without the union step. This may be more performant, but I haven't been able to test it because visual studio keeps rewriting my sql for me... I'd expect the SQL to look something like this (but I haven't tested it):
select users.*, sum(voting.vote)
from users, voting, content_a, content_b, content_c
where users.id = content_a.user_id (+)
and users.id = content_b.user_id (+)
and users.id = content_c.user_id (+)
and ((content_a.id = voting.content_id and voting.content_type_id = 1) OR
(content_b.id = voting.content_id and voting.content_type_id = 2) OR
(content_c.id = voting.content_id and voting.content_type_id = 3))
group by users.id
order by sum(voting.vote) desc;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With