Getting distinct rows from a left outer join

Tags:

I am building an application which dynamically generates sql to search for rows of a particular Table (this is the main domain class, like an Employee).

There are three tables Table1, Table2 and Table1Table2Map. Table1 has a many to many relationship with Table2, and is mapped through Table1Table2Map table. But since Table1 is my main table the relationship is virtually like a one to many.

My app generates a sql which basically gives a result set containing rows from all these tables. The select clause and joins dont change whereas the where clause is generated based on user interaction. In any case I dont want duplicate rows of Table1 in my result set as it is the main table for result display. Right now the query that is getting generated is like this:

select distinct Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)

For simplicity I have excluded the where clause. The problem is when there are multiple rows in Table2 for Table1 even though I have said distinct of Table1.Id the result set has duplicate rows of Table1 as it has to select all the matching rows in Table2.

To elaborate more, consider that for a row in Table1 with Id = 1 there are two rows in Table1Table2Map (1, 1) and (1, 2) mapping Table1 to two rows in Table2 with ids 1, 2. The above mentioned query returns duplicate rows for this case. Now I want the query to return Table1 row with Id 1 only once. This is because there is only one row in Table2 that is like an active value for the corresponding entry in Table1 (this information is in Mapping table). Is there a way I can avoid getting duplicate rows of Table1.

I think there is some basic problem in the way I am trying to solve the problem, but I am not able to find out what it is. Thanks in advance.

497

asked Apr 25 '09 13:04

Nazgul

6 Answers

Try:

left outer join (select distinct YOUR_COLUMNS_HERE ...) SUBQUERY_ALIAS on ...

In other words, don't join directly against the table, join against a sub-query that limits the rows you join against.

185

answered Oct 10 '22 20:10

Lasse V. Karlsen

You can use GROUP BY on Table1.Id ,and that will get rid off the extra rows. You wouldn't need to worry about any mechanics on join side.

I came up with this solution in a huge query and it this solution didnt effect the query time much.

NOTE : I'm answering this question 3 years after its been asked but this may help someone i believe.

answered Oct 10 '22 21:10

kommradHomer

You can re-write your left joins to be outer applies, so that you can use a top 1 and an order by as follows:

select Table1.Id as Id, Table1.Name, Table2.Description 
from Table1
outer apply (
   select top 1 *
   from Table1Table2Map
   where (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
   order by somethingCol 
) t1t2
outer apply (
   select top 1 *
   from Table2
   where (Table2.Id = Table1Table2Map.Table2Id)
) t2;

Note that an outer apply without a "top" or an "order by" is exactly equivalent to a left outer join, it just gives you a little more control. (cross apply is equivalent to an inner join).

You can also do something similar using the row_number() function:

 select * from (
      select distinct Table1.Id as Id, Table1.Name, Table2.Description,
        rowNum = row_number() over ( partition by table1.id order by something )
      from Table1
      left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
      left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
 ) x
 where rowNum = 1;

Most of this doesn't apply if the IsActive flag can narrow down your other tables to one row, but they might come in useful for you.

answered Oct 10 '22 22:10

John Gibb

To elaborate on one point: you said that there is only one "active" row in Table2 per row in Table1. Is that row not marked as active such that you could put it in the where clause? Or is there some magic in the dynamic conditions supplied by the user that determines what's active and what isn't.

If you don't need to select anything from Table2 the solution is relatively simply in that you can use the EXISTS function but since you've put TAble2.Description in the clause I'll assume that's not the case.

Basically what separates the relevant rows in Table2 from the irrelevant ones? Is it an active flag or a dynamic condition? The first row? That's really how you should be removing duplicates.

DISTINCT clauses tend to be overused. That may not be the case here but it sounds like it's possible that you're trying to hack out the results you want with DISTINCT rather than solving the real problem, which is a fairly common problem.

answered Oct 10 '22 20:10

cletus

You have to include activity clause into your join (and no need for distinct):

select Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)

answered Oct 10 '22 22:10

Arvo

If you want to display multiple rows from table2 you will have duplicate data from table1 displayed. If you wanted to you could use an aggregate function (IE Max, Min) on table2, this would eliminate the duplicate rows from table1, but would also hide some of the data from table2.

See also my answer on question #70161 for additional explanation

answered Oct 10 '22 21:10

Nathan Koop

Related questions
                            
                                jooq 3.12.0 custom data type binding generated code compile errors
                            
                                R: Copy/Clone full DB from SQL Server to local
                            
                                Full-text Index stop list for Persian language
                            
                                How to order by many to many relationship in Sequelize?
                            
                                Postgres - Geospatial Search in LINQ or SQL with PostGis
                            
                                How can I optimise this LIKE JOIN query?
                            
                                Optimizing GROUP BY + COUNT DISTINCT on unnested jsonb column
                            
                                Postgres greater than or null
                            
                                Asynchronous Stored Procedure Calls
                            
                                When to commit changes?
                            
                                Microsoft SQL Server Management Studio running script from inside a script
                            
                                Rails : How to build statistics per day/month/year or How database agnostic SQL functions are missing (ex. : STRFTIME, DATE_FORMAT, DATE_TRUNC)
                            
                                With a SELECT...WHERE id IN (...), order results by IN()? [duplicate]
                            
                                Can we use 'GO' multiple times in SQL Transaction?
                            
                                Oracle: how to create an identity column? [duplicate]
                            
                                Convert Numeric value to Varchar
                            
                                Is there an agreed ideal schema for tagging
                            
                                PHP - Application config file stored as - ini,php,sql,cached,php class,JSON,php array?
                            
                                SQL where condition always true
                            
                                Get records from last hour

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting distinct rows from a left outer join

Tags:

sql

sql-server

sql-server-2005

linq-to-sql