Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle equivalent of Postgres' DISTINCT ON?

In postgres, you can query for the first value of in a group with DISTINCT ON. How can this be achieved in Oracle?

From the postgres manual:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

For example, for a given table:

 col1 | col2  ------+------  A    | AB  A    | AD  A    | BC  B    | AN  B    | BA  C    | AC  C    | CC 

Ascending sort:

> select distinct on(col1) col1, col2 from tmp order by col1, col2 asc;  col1 | col2  ------+------  A    | AB  B    | AN  C    | AC 

Descending sort:

> select distinct on(col1) col1, col2 from tmp order by col1, col2 desc;  col1 | col2  ------+------  A    | BC  B    | BA  C    | CC 
like image 565
beerbajay Avatar asked May 09 '12 11:05

beerbajay


People also ask

Does Oracle support distinct?

Introduction to Oracle SELECT DISTINCT statementThe DISTINCT clause is used in a SELECT statement to filter duplicate rows in the result set. It ensures that rows returned are unique for the column or columns specified in the SELECT clause.

How does distinct on work in Postgres?

Introduction to PostgreSQL SELECT DISTINCT clauseThe DISTINCT clause is used in the SELECT statement to remove duplicate rows from a result set. The DISTINCT clause keeps one row for each group of duplicates. The DISTINCT clause can be applied to one or more columns in the select list of the SELECT statement.


2 Answers

The same effect can be replicated in Oracle either by using the first_value() function or by using one of the rank() or row_number() functions.

Both variants also work in Postgres.

first_value()

select distinct col1,  first_value(col2) over (partition by col1 order by col2 asc) from tmp 

first_value gives the first value for the partition, but repeats it for each row, so it is necessary to use it in combination with distinct to get a single row for each partition.

row_number() / rank()

select col1, col2 from (   select col1, col2,    row_number() over (partition by col1 order by col2 asc) as rownumber    from tmp ) foo where rownumber = 1 

Replacing row_number() with rank() in this example yields the same result.

A feature of this variant is that it can be used to fetch the first N rows for a given partition (e.g. "last 3 updated") simply by changing rownumber = 1 to rownumber <= N.

like image 89
beerbajay Avatar answered Sep 19 '22 15:09

beerbajay


If you have more than two fields then use beerbajays answer as a sub query (note in DESC order):

select col1,col2, col3,col4 from tmp where col2 in ( select distinct  first_value(col2) over (partition by col1 order by col2 DESC) as col2 from  tmp --WHERE you decide conditions ) 
like image 41
Jeremy Thompson Avatar answered Sep 18 '22 15:09

Jeremy Thompson