I need to find duplicates in a table. In MySQL I simply write: <pre class="prettyprint"><code>SELECT *,count(id) count FROM `MY_TABLE` GROUP BY SOME_COLUMN ORDER BY count DESC </code></pre> This query nicely: <ul> <li>Finds duplicates based on SOME_COLUMN, giving its repetition count.</li> <li>Sorts in desc order of repetition, which is useful to quickly scan major dups.</li> <li>Chooses a random value for all remaining columns, giving me an idea of values in those columns.</li> </ul> Similar query in Postgres greets me with an error: <blockquote> column "MY_TABLE.SOME_COLUMN" must appear in the GROUP BY clause or be used in an aggregate function </blockquote> What is the Postgres equivalent of this query? PS: I know that MySQL behaviour deviates from SQL standards.

mysql allows <code>group by</code> to omit non-aggregated selected columns from the <code>group by</code> list, which it executes by returning the first row found for each unique combination of grouped by columns. This is non-standard SQL behaviour. postgres on the other hand is SQL standard compliant. There is no equivalent query in postgres.

PostgreSQL equivalent for MySQL GROUP BY

Tags:

sql

mysql

postgresql

group-by

aggregate-functions

I need to find duplicates in a table. In MySQL I simply write:

SELECT *,count(id) count FROM `MY_TABLE`
GROUP BY SOME_COLUMN ORDER BY count DESC

This query nicely:

Finds duplicates based on SOME_COLUMN, giving its repetition count.
Sorts in desc order of repetition, which is useful to quickly scan major dups.
Chooses a random value for all remaining columns, giving me an idea of values in those columns.

Similar query in Postgres greets me with an error:

column "MY_TABLE.SOME_COLUMN" must appear in the GROUP BY clause or be used in an aggregate function

What is the Postgres equivalent of this query?

PS: I know that MySQL behaviour deviates from SQL standards.

690

asked May 01 '12 13:05

jerrymouse

3 Answers

Back-ticks are a non-standard MySQL thing. Use the canonical double quotes to quote identifiers (possible in MySQL, too). That is, if your table in fact is named "MY_TABLE" (all upper case). If you (more wisely) named it my_table (all lower case), then you can remove the double quotes or use lower case.

Also, I use ct instead of count as alias, because it is bad practice to use function names as identifiers.

Simple case

This would work with PostgreSQL 9.1:

SELECT *, count(id) ct
FROM   my_table
GROUP  BY primary_key_column(s)
ORDER  BY ct DESC;

It requires primary key column(s) in the GROUP BY clause. The results are identical to a MySQL query, but ct would always be 1 (or 0 if id IS NULL) - useless to find duplicates.

Group by other than primary key columns

If you want to group by other column(s), things get more complicated. This query mimics the behavior of your MySQL query - and you can use *.

SELECT DISTINCT ON (1, some_column)
       count(*) OVER (PARTITION BY some_column) AS ct
      ,*
FROM   my_table
ORDER  BY 1 DESC, some_column, id, col1;

This works because DISTINCT ON (PostgreSQL specific), like DISTINCT (SQL-Standard), are applied after the window function count(*) OVER (...). Window functions (with the OVER clause) require PostgreSQL 8.4 or later and are not available in MySQL.

Works with any table, regardless of primary or unique constraints.

The 1 in DISTINCT ON and ORDER BY is just shorthand to refer to the ordinal number of the item in the SELECT list.

SQL Fiddle to demonstrate both side by side.

More details in this closely related answer:

Select first row in each GROUP BY group?

`count(*)` vs. `count(id)`

If you are looking for duplicates, you are better off with count(*) than with count(id). There is a subtle difference if id can be NULL, because NULL values are not counted - while count(*) counts all rows. If id is defined NOT NULL, results are the same, but count(*) is generally more appropriate (and slightly faster, too).

178

answered Oct 07 '22 00:10

Erwin Brandstetter

Here's another approach, uses DISTINCT ON:

select 

  distinct on(ct, some_column) 

  *,
  count(id) over(PARTITION BY some_column) as ct

from my_table x
order by ct desc, some_column, id

Data source:

CREATE TABLE my_table (some_column int, id int, col1 int);

INSERT INTO my_table  VALUES
 (1, 3,  4)
,(2, 4,  1)
,(2, 5,  1)
,(3, 6,  4)
,(3, 7,  3)
,(4, 8,  3)
,(4, 9,  4)
,(5, 10, 1)
,(5, 11, 2)
,(5, 11, 3);

Output:

SOME_COLUMN ID          COL1        CT
5           10          1           3
2           4           1           2
3           6           4           2
4           8           3           2
1           3           4           1

Live test: http://www.sqlfiddle.com/#!1/e2509/1

DISTINCT ON documentation: http://www.postgresonline.com/journal/archives/4-Using-Distinct-ON-to-return-newest-order-for-each-customer.html

answered Oct 06 '22 22:10

Michael Buen

mysql allows group by to omit non-aggregated selected columns from the group by list, which it executes by returning the first row found for each unique combination of grouped by columns. This is non-standard SQL behaviour.

postgres on the other hand is SQL standard compliant.

There is no equivalent query in postgres.

answered Oct 06 '22 22:10

Bohemian

Related questions
                            
                                MySQL "IS IN" equivalent?
                            
                                How do I create a database if it doesn't exist, using PHP?
                            
                                How can I change my MySQL collation in WAMPSERVER
                            
                                SQL NOT BETWEEN query
                            
                                Padding the beginning of a mysql INT field with zeroes
                            
                                mysql_fetch_array does not retrieve all rows
                            
                                mysql group_concat in where
                            
                                saving MySQL settings
                            
                                How to fix "Variable 'sql_mode' can't be set to the value of 'NULL'" error
                            
                                MySQL Grouping OR and AND clauses
                            
                                bigint in mysql
                            
                                MYSQL get all results but first [duplicate]
                            
                                Entity Framework Inserting Initial Data On Rebuild
                            
                                mysql int field growing bigger than 11 digits
                            
                                Correct Method For Adding Images To An RSS Feed?
                            
                                should i need to create mysql accounts for all the user who register on my website or create 1 mysql user and pass cretitials to all users
                            
                                how to select Count of Ranges from mysql table?
                            
                                Syntax Error Creating a table with a column named 'desc'
                            
                                execute mysql query present in variable
                            
                                Update a column in MySQL table if only the values are empty or NULL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL equivalent for MySQL GROUP BY

Tags:

sql

mysql

postgresql

group-by

aggregate-functions

jerrymouse

People also ask

3 Answers

Simple case

Group by other than primary key columns

`count(*)` vs. `count(id)`

Erwin Brandstetter

Michael Buen

Bohemian

Recent Activity

Donate For Us

PostgreSQL equivalent for MySQL GROUP BY

Tags:

sql

mysql

postgresql

group-by

aggregate-functions

jerrymouse

People also ask

3 Answers

Simple case

Group by other than primary key columns

count(*) vs. count(id)

Erwin Brandstetter

Michael Buen

Bohemian

Related questions

Recent Activity

Donate For Us

`count(*)` vs. `count(id)`