Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting columns with DISTINCT in PostgreSQL

I'm querying bus stops from a database, and I wish to have it only return 1 stop per bus line/direction. This query does just that:

Stop.select("DISTINCT line_id, direction")

Except that it won't give me any other attribute than those 2. I tried a couple of other queries to have it return the id in addition to the line_id and direction fields (ideally it would return all columns), with no luck:

Stop.select("DISTINCT line_id, direction, id")

and

Stop.select("DISTINCT(line_id || '-' || direction), id")

In both cases, the query loses its distinct clause and all rows are returned.

Some awesome dude helped me out and suggested to use a subquery to have it return all the ids:

Stop.find_by_sql("SELECT DISTINCT a1.line_id, a1.direction, (SELECT a2.id from stops a2 where a2.line_id = a1.line_id AND a2.direction = a1.direction ORDER BY a2.id ASC LIMIT 1) as id FROM stops a1

I can then extract all the ids and perform a 2nd query to fetch the full attributes for each stop.

Is there a way to have it all inside 1 query AND have it return all the attributes?

like image 294
samvermette Avatar asked Feb 15 '11 20:02

samvermette


People also ask

How do I get unique column values in PostgreSQL?

Syntax: SELECT DISTINCT ON (column_1) column_alias, column_2 FROM table_name ORDER BY column_1, column_2; As the order of rows returned from the SELECT statement is unpredictable which means the “first row” of each group of the duplicate is also unpredictable.

How do I SELECT one column as distinct?

Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set.

How do I SELECT multiple columns in PostgreSQL?

If you specify a list of columns, you need to place a comma ( , ) between two columns to separate them. If you want to select data from all the columns of the table, you can use an asterisk ( * ) shorthand instead of specifying all the column names.

Can we use distinct on multiple columns?

In SQL multiple fields may also be added with DISTINCT clause. DISTINCT will eliminate those rows where all the selected fields are identical.


2 Answers

Stop.select("DISTINCT ON (line_id, direction) *")
like image 170
Pier-Olivier Thibault Avatar answered Sep 29 '22 03:09

Pier-Olivier Thibault


Not so fast - The other answer selects stop_id arbitrary

This is why your question makes no sense. We can pull stop_ids and have distinct line_id and direction. But we have no idea why we have the stop_id we do.

    create temp table test( line_id integer, direction char(1), stop_id      integer);
    insert into test values
            (1, 'N', 1),
            (1, 'N', 2),
            (1, 'S', 1),
            (1, 'S', 2),
            (2, 'N', 1),
            (2, 'N', 2),
            (2, 'S', 1),
            (2, 'S', 2)
    ;
    select distinct on (line_id, direction) * from test;
    -- do this again but will reverse the order of stop_ids
    -- could it possible change our Robust Query?!!!
    drop table test;
    create temp table test(line_id integer,direction char(1),stop_id integer);
    insert into test values
            (1, 'N', 2),
            (1, 'N', 1),
            (1, 'S', 2),
            (1, 'S', 1),
            (2, 'N', 2),
            (2, 'N', 1),
            (2, 'S', 2),
            (2, 'S', 1)
    ;
    select distinct on (line_id, direction) * from test;

First select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       1
       1 | S         |       1
       2 | N         |       1
       2 | S         |       1

Second select:

line_id | direction | stop_id 
---------+-----------+---------
       1 | N         |       2
       1 | S         |       2
       2 | N         |       2
       2 | S         |       2

So we got away without grouping stop_id but we have no guarantees why we got the one we did. All we know is that this is valid stop_id. Any updates, inserts, and other stuff that no RDMS will guarantee can be changing around the physical order of rows.

This is what I meant in the top comment. There is no known reason for pulling one stop_id over the other one, but somehow you need this stop_id (or whatever else) desperately.

like image 38
nate c Avatar answered Sep 29 '22 03:09

nate c