Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

django comment framework: distinct() does not work?

Running distinct() on any field of the comment model always returns all the records,

Comment.objects.values('user').distinct()

[{'user': 1}, {'user': 0}, {'user': 0}, {'user': 0}, {'user': 0}, {'user': 1}, {'user': 1}, {'user': 1}, {'user': 1}]

Comment.objects.values('ip_address').distinct()

[{'ip_address': u'127.0.0.1'},{'ip_address': u'192.168.0.180'}, {'ip_address':u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0. 180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}]

Why is this happening? Is there a way around this? Thanks!

ps: distinct() does run very well in different types of fields of a custom model during my test. Something special about the Comments framework?

Bit of conclusion Thanks everybody answering this question, combined with some reading I get conclusion as following:

  1. values() influences the lookup fields in SELECT part of the final sql("values() takes optional positional arguments, *fields, which specify field names to which the SELECT should be limited")
  2. order_by() adds its parameter to the SELECT part as well.
  3. using distinct() in a look up will result the sql to look like this:

    SELECT DISTINCT [fields1, fields2, fields3] FROM ... WHERE...

    and the values of the fields all together decides whether a record is unique. The fields may come from values() or order_by() functions in the lookup.

  4. So the order_by() is adding some unwanted effects when combined with distinct(), the fields specified in order_by is also take into consideration whether a record is unique

  5. Django Comment has a hidden order_by parameter by default, thus creating the whole problem. Any model has a hidden order_by when returning the qs can cause the same problem.

  6. The way of solving it is by adding an empty order_by() at the end of the lookup, which removes the default order_by.
like image 798
Xun Yang Avatar asked Jan 26 '12 11:01

Xun Yang


3 Answers

Comment.objects.values('user').distinct().order_by()
like image 167
DrTyrsa Avatar answered Nov 07 '22 07:11

DrTyrsa


You can wrap your query in set;

distinct() does not go well with values() as per documentation

ip_sets = set(Comment.objects.order_by().values('ip_address'))
ip_list = list(set(Comment.objects.order_by().values('ip_address')))
like image 43
7guyo Avatar answered Nov 07 '22 05:11

7guyo


I haven't verified that this is the cause, but Comment model has a default ordering which influences distinct() method:

In [1]: print Comment.objects.values('ip_address').distinct().query
SELECT DISTINCT "django_comments"."ip_address", "django_comments"."submit_date" FROM "django_comments" ORDER BY "django_comments"."submit_date" ASC

It's a documented feature.

Now, how could it be that two comments have exactly the same timestamp? I suppose you're using MySQL which doesn't support anything less than a second.

And if you want to get rid of the default ordering, just do:

Comment.objects.order_by().values('ip_address').distinct()
like image 6
Tomasz Zieliński Avatar answered Nov 07 '22 06:11

Tomasz Zieliński