I have a table <code>author_data</code>: <pre class="prettyprint"><code> author_id | author_name ----------+---------------- 9 | ernest jordan 14 | k moribe 15 | ernest jordan 25 | william h nailon 79 | howard jason 36 | k moribe </code></pre> Now I need the result as: <pre class="prettyprint"><code> author_id | author_name ----------+---------------- 9 | ernest jordan 15 | ernest jordan 14 | k moribe 36 | k moribe </code></pre> That is, I need the <code>author_id</code> for the names having duplicate appearances. I have tried this statement: <pre class="prettyprint"><code>select author_id,count(author_name) from author_data group by author_name having count(author_name)>1 </code></pre> But it's not working. How can I get this?

I suggest a window function in a subquery: <pre class="prettyprint"><code>SELECT author_id, author_name -- omit the name here if you just need ids FROM ( SELECT author_id, author_name , count(*) OVER (PARTITION BY author_name) AS ct FROM author_data ) sub WHERE ct > 1; </code></pre> You will recognize the basic aggregate function <code>count()</code>. It can be turned into a window function by appending an <code>OVER</code> clause - just like any other aggregate function. This way it counts rows per partition. Voilá. It has to be done in a subquery because the result cannot be referenced in the <code>WHERE</code> clause in the same <code>SELECT</code> (happens after <code>WHERE</code>). See: <ul> <li>Best way to get result count before LIMIT was applied</li> </ul> In older versions without window functions (v.8.3 or older) - or generally - this alternative performs pretty fast: <pre class="prettyprint"><code>SELECT author_id, author_name -- omit name, if you just need ids FROM author_data a WHERE EXISTS ( SELECT FROM author_data a2 WHERE a2.author_name = a.author_name AND a2.author_id <> a.author_id ); </code></pre> If you are concerned with performance, add an index on <code>author_name</code>.

You are half way there already. You need to just use the identified <code>Author_IDs</code> and fetch the rest of the data. try this.. <pre class="prettyprint"><code>SELECT author_id, author_name FROM author_data WHERE author_id in (select author_id from author_data group by author_name having count(author_name)>1) </code></pre>

Find rows with duplicate values in a column

Tags:

sql

duplicates

postgresql

aggregate-functions

window-functions

I have a table author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

Now I need the result as:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

That is, I need the author_id for the names having duplicate appearances. I have tried this statement:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

But it's not working. How can I get this?

406

asked Mar 28 '14 20:03

user3171906

3 Answers

I suggest a window function in a subquery:

SELECT author_id, author_name  -- omit the name here if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

You will recognize the basic aggregate function count(). It can be turned into a window function by appending an OVER clause - just like any other aggregate function.

This way it counts rows per partition. Voilá.

It has to be done in a subquery because the result cannot be referenced in the WHERE clause in the same SELECT (happens after WHERE). See:

Best way to get result count before LIMIT was applied

In older versions without window functions (v.8.3 or older) - or generally - this alternative performs pretty fast:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT FROM author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

If you are concerned with performance, add an index on author_name.

143

answered Oct 19 '22 10:10

Erwin Brandstetter

You are half way there already. You need to just use the identified Author_IDs and fetch the rest of the data.

try this..

SELECT author_id, author_name
FROM author_data
WHERE author_id in (select author_id
        from author_data
        group by author_name
        having count(author_name)>1)

answered Oct 19 '22 11:10

SoulTrain

You could join the table onto itself, which is achievable with either of the following queries:

SELECT a1.author_id, a1.author_name
FROM authors a1
CROSS JOIN authors a2
  ON a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

--OR

SELECT a1.author_id, a1.author_name
FROM authors a1
INNER JOIN authors a2
  WHERE a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

answered Oct 19 '22 11:10

coisnepe

Related questions
                            
                                Inner Join Delete in SQL Server 2008?
                            
                                getting count from the same column in a mysql table?
                            
                                Sum of Multiple rows in MySql
                            
                                Crosstab Query with Dynamic Columns in SQL Server 2005 up
                            
                                Group by data intervals
                            
                                SELECT 5 most recent SQL Server
                            
                                How to convert java.util.Date into current time in timestamp..?
                            
                                SQL GROUP BY and a condition on COUNT
                            
                                SQL create table use %type at column
                            
                                Querying the inner join of two tables with the same column name, Column 'exName' in field list is ambiguous
                            
                                Why does EXEC retport an error of MUST DECLARE SCALAR VARIABLE
                            
                                Rails 3 Sum Product of two fields
                            
                                Can I improve this query?
                            
                                SELECT * - pros /cons
                            
                                How to compare two date values using SQL [duplicate]
                            
                                How can a blank MS Access database be created using VBA?
                            
                                Replace multiple characters from string without using any nested replace functions
                            
                                Difference between SQL Server codes?
                            
                                DBNull check for ExecuteScalar
                            
                                Change column value when matching condition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With