suppose we have a model in django defined as follows: <pre class="prettyprint"><code>class Literal: name = models.CharField(...) ... </code></pre> Name field is not unique, and thus can have duplicate values. I need to accomplish the following task: Select all rows from the model that have at least one duplicate value of the <code>name</code> field. I know how to do it using plain SQL (may be not the best solution): <pre class="prettyprint"><code>select * from literal where name IN ( select name from literal group by name having count((name)) > 1 ); </code></pre> So, is it possible to select this using django ORM? Or better SQL solution?

Try: <pre class="prettyprint"><code>from django.db.models import Count Literal.objects.values('name') .annotate(Count('id')) .order_by() .filter(id__count__gt=1) </code></pre> This is as close as you can get with Django. The problem is that this will return a <code>ValuesQuerySet</code> with only <code>name</code> and <code>count</code>. However, you can then use this to construct a regular <code>QuerySet</code> by feeding it back into another query: <pre class="prettyprint"><code>dupes = Literal.objects.values('name') .annotate(Count('id')) .order_by() .filter(id__count__gt=1) Literal.objects.filter(name__in=[item['name'] for item in dupes]) </code></pre>

This was rejected as an edit. So here it is as a better answer <pre class="prettyprint"><code>dups = ( Literal.objects.values('name') .annotate(count=Count('id')) .values('name') .order_by() .filter(count__gt=1) ) </code></pre> This will return a <code>ValuesQuerySet</code> with all of the duplicate names. However, you can then use this to construct a regular <code>QuerySet</code> by feeding it back into another query. The django ORM is smart enough to combine these into a single query: <pre class="prettyprint"><code>Literal.objects.filter(name__in=dups) </code></pre> The extra call to <code>.values('name')</code> after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.

try using aggregation <pre class="prettyprint"><code>Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1) </code></pre>

Django select only rows with duplicate field values

Tags:

sql

django

django-orm

suppose we have a model in django defined as follows:

class Literal:
    name = models.CharField(...)
    ...

Name field is not unique, and thus can have duplicate values. I need to accomplish the following task: Select all rows from the model that have at least one duplicate value of the name field.

I know how to do it using plain SQL (may be not the best solution):

select * from literal where name IN (
    select name from literal group by name having count((name)) > 1
);

So, is it possible to select this using django ORM? Or better SQL solution?

225

asked Jan 24 '12 15:01

dragoon

3 Answers

Try:

from django.db.models import Count Literal.objects.values('name')                .annotate(Count('id'))                 .order_by()                .filter(id__count__gt=1)

This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:

dupes = Literal.objects.values('name')                        .annotate(Count('id'))                        .order_by()                        .filter(id__count__gt=1) Literal.objects.filter(name__in=[item['name'] for item in dupes])

181

answered Oct 17 '22 21:10

Chris Pratt

This was rejected as an edit. So here it is as a better answer

dups = (     Literal.objects.values('name')     .annotate(count=Count('id'))     .values('name')     .order_by()     .filter(count__gt=1) )

This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django ORM is smart enough to combine these into a single query:

Literal.objects.filter(name__in=dups)

The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.

answered Oct 17 '22 22:10

Piper Merriam

try using aggregation

Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)

answered Oct 17 '22 20:10

JamesO

Related questions
                            
                                How can I combine multiple rows into a comma-delimited list in Oracle? [duplicate]
                            
                                How to Concatenate Numbers and Strings to Format Numbers in T-SQL?
                            
                                Is there a better way to dynamically build an SQL WHERE clause than by using 1=1 at its beginning?
                            
                                Convert varchar to uniqueidentifier in SQL Server
                            
                                Postgresql tables exists, but getting "relation does not exist" when querying
                            
                                What is a good reason to use SQL views?
                            
                                How to query database by id using SqlAlchemy?
                            
                                Does PostgreSQL support "accent insensitive" collations?
                            
                                SQL Server Script to create a new user
                            
                                How do I import .sql files into SQLite 3?
                            
                                Why no love for SQL? [closed]
                            
                                Querying data by joining two tables in two database on different servers
                            
                                subquery in FROM must have an alias
                            
                                How To Create Table with Identity Column
                            
                                MySQL: Invalid use of group function
                            
                                MySQL 'create schema' and 'create database' - Is there any difference
                            
                                Select Row number in postgres
                            
                                C# SQL Server - Passing a list to a stored procedure
                            
                                What does Include() do in LINQ?
                            
                                Why do we always prefer using parameters in SQL statements?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With