Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicate elements in Django Paginate after `order_by` call

I'm using Django 1.7.7.

I'm wondering if anyone has experienced this. This is my query:

events = Event.objects.filter(
    Q(date__gt=my_date) | Q(date__isnull=True)
).filter(type__in=[...]).order_by('date')

When I try to then paginate it

p = Paginator(events, 10)
p.count  # Gives 91

event_ids = []
for i in xrange(1, p.count / 10 + 2):
    event_ids += [i.id for i in p.page(i)]

print len(event_ids)  # Still 91
print len(set(event_ids))  # 75

I noticed that if I removed the .order_by, I don't get any duplicates. I then tried just .order_by with Event.objects.all().order_by('date') which gave no duplicates.

Finally, I tried this:

events = Event.objects.filter(
    Q(date__gt=my_date) | Q(date__isnull=True)
).order_by('date')

p = Paginator(events, 10)
events.count()  # Gives 131
p.count  # Gives 131

event_ids = []
for i in xrange(1, p.count / 10 + 2):
    event_ids += [i.id for i in p.page(i)]

len(event_ids)  # Gives 131
len(set(event_ids))  # Gives 118

... and there are duplicates. Can anyone explain what's going on?

I dug into the Django source (https://github.com/django/django/blob/master/django/core/paginator.py#L46-L55) and it seems to be something to do with how Django slices the object_list.

Any help is appreciated. Thanks.

Edit: distinct() has no affect on the duplicates. There aren't any duplicates in the database and I don't think the query introduces any duplicates ([e for e in events.iterator()] doesn't produce any duplicates). It's just when the Paginator is slicing.

Edit2: Here's a more complete example

In [1]: from django.core.paginator import Paginator

In [2]: from datetime import datetime, timedelta

In [3]: my_date = timezone.now()

In [4]:   1 events = Event.objects.filter(
          2     Q(date__gt=my_date) | Q(date__isnull=True)
          3 ).order_by('date')

In [5]: events.count()
Out[5]: 134

In [6]: p = Paginator(events, 10)

In [7]: p.count
Out[7]: 134

In [8]: event_ids = []

In [9]:   1 for i in xrange(1, p.num_pages + 1):
          2     event_ids += [j.id for j in p.page(i)]

In [10]: len(event_ids)
Out[10]: 134

In [11]: len(set(event_ids))
Out[11]: 115
like image 878
davidvuong Avatar asked Jul 31 '15 01:07

davidvuong


2 Answers

Try to use .distinct() on your query before passing it to Paginator.

like image 87
Yaroslav Varkhol Avatar answered Oct 25 '22 03:10

Yaroslav Varkhol


oh, shot in the dark, but i think i might know what it is. i wasn't able to reproduce it in sqlite but using mysql. i think mysql trying to sort on a column that has the same value has it returning the same results during slicing

the pagination splicing basically does an sql statement of SELECT ... FROM ... WHERE (date > D OR date IS NULL) ORDER BY date ASC LIMIT X OFFSET X

But when date is null I'm not sure how mysql sorts it. So when I tried two sql queries of LIMIT 10 and LIMIT 10 OFFSET 10 it returned sets that had the same rows, while LIMIT 20 produce a unique set.

you can try to update your order_by to order_by('id', 'date') to have it sort by a unique field first and it may fix it.

like image 33
toad013 Avatar answered Oct 25 '22 05:10

toad013