Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django models - how to filter out duplicate values by PK after the fact?

I build a list of Django model objects by making several queries. Then I want to remove any duplicates, (all of these objects are of the same type with an auto_increment int PK), but I can't use set() because they aren't hashable.

Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.

like image 559
wsorenson Avatar asked Apr 13 '09 16:04

wsorenson


People also ask

How do I remove duplicates in Django ORM?

The way I've de-duplicated is with annotate , Max , Count , and filter . It's one of the more efficient ways I've found. The idea is that you start your queryset, then annotate a max or min of some value (creation date, sequential ID), and then filter the queryset for records with a count greater than one.

What is QuerySet?

A QuerySet is a collection of data from a database. A QuerySet is built up as a list of objects. QuerySets makes it easier to get the data you actually need, by allowing you to filter and order the data.


1 Answers

In general it's better to combine all your queries into a single query if possible. Ie.

q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))

instead of

q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)

If the first query is returning duplicated Models then use distinct()

q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()

If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.

like image 84
gerdemb Avatar answered Sep 23 '22 13:09

gerdemb