suppose we have a model in django defined as follows:
class Literal:
name = models.CharField(...)
...
Name field is not unique, and thus can have duplicate values. I need to accomplish the following task:
Select all rows from the model that have at least one duplicate value of the name
field.
I know how to do it using plain SQL (may be not the best solution):
select * from literal where name IN (
select name from literal group by name having count((name)) > 1
);
So, is it possible to select this using django ORM? Or better SQL solution?
The SQL DISTINCT keyword, which we have already discussed is used in conjunction with the SELECT statement to eliminate all the duplicate records and by fetching only the unique records.
Once you have grouped data you can filter out duplicates by using having clause. Having clause is the counterpart of where clause for aggregation queries. Just remember to provide a temporary name to count() data in order to use them in having clause.
If you do not include DISTINCT in a SELECT clause, you might find duplicate rows in your result, because SQL returns the JOB column's value for each row that satisfies the search condition. Null values are treated as duplicate rows for DISTINCT.
Introduction to SQL DISTINCT operator It doesn't delete duplicate rows in the table. If you want to select two columns and remove duplicates in one column, you should use the GROUP BY clause instead.
Try:
from django.db.models import Count Literal.objects.values('name') .annotate(Count('id')) .order_by() .filter(id__count__gt=1)
This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet
with only name
and count
. However, you can then use this to construct a regular QuerySet
by feeding it back into another query:
dupes = Literal.objects.values('name') .annotate(Count('id')) .order_by() .filter(id__count__gt=1) Literal.objects.filter(name__in=[item['name'] for item in dupes])
This was rejected as an edit. So here it is as a better answer
dups = ( Literal.objects.values('name') .annotate(count=Count('id')) .values('name') .order_by() .filter(count__gt=1) )
This will return a ValuesQuerySet
with all of the duplicate names. However, you can then use this to construct a regular QuerySet
by feeding it back into another query. The django ORM is smart enough to combine these into a single query:
Literal.objects.filter(name__in=dups)
The extra call to .values('name')
after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.
try using aggregation
Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With