Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select distinct and group by on subset of columns

I'm using SQL Server 2008 and I got stuck in this vicious circle between DISTINCT and GROUP BY

I've got the following dummy table myTable:

ID   Street         City          PostalCode   ProjectID   Date        NameId
1    Bar Street     Sunny Beach   666          7           25/08/2013  111
2    Sin Street     Ibiza         999          5           12/06/2013  222 
3    Bar Street     Sunny Beach   666          7           07/08/2013  333
4    Bora Bora      Bora Bora     1000         10          17/07/2013  444
5    Sin Street     Ibiza         999          5           04/07/2013  555

I want to obtain all records (probably first occurrence) with distinct Addresses(Street, City, PostalCode) and ProjectIDs. For example the result here should be:

ID   Street         City          PostalCode   ProjectID   Date        NameId
1    Bar Street     Sunny Beach   666          7           25/08/2013  111
2    Sin Street     Ibiza         999          5           12/06/2013  222 
4    Bora Bora      Bora Bora     1000         10          17/07/2013  444

I've tried with DISTINCT on all columns, but this won't work, since ID is unique and always returns all the columns. Also tried Group by Street, City PostalCode ProjectID, but an error occurred regarding Date and NameId.

Column '' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

So how can I get a result, where a subset of columns are distinct?

like image 654
Anton Belev Avatar asked Aug 12 '13 11:08

Anton Belev


1 Answers

You want to use the row_number() function:

select t.ID, t.Street, t.City, t.PostalCode, t.ProjectID, t.Date, t.NameId
from (select t.*,
             row_number() over (partition by Street, City, PostalCode, ProjectId
                                order by id
                               ) as seqnum
      from t
     ) t
where seqnum = 1;

This is a window function that assigns a sequential value to rows with the same values in certain columns (defined by the partition by clause). The ordering within these rows is determined by the order by clause. In this case, it starts ordering with the lowest id in the group, so the outer query just selects the first one.

like image 90
Gordon Linoff Avatar answered Sep 30 '22 11:09

Gordon Linoff