Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using DISTINCT inner join in SQL

I have three tables, A, B, C, where A is many to one B, and B is many to one C. I'd like a list of all C's in A.

My tables are something like this: A[id, valueA, lookupB], B[id, valueB, lookupC], C[id, valueC]. I've written a query with two nested SELECTs, but I'm wondering if it's possible to do INNER JOIN with DISTINCT somehow.

SELECT valueC FROM C INNER JOIN (     SELECT DISTINCT lookupC     FROM B INNER JOIN     (         SELECT DISTINCT lookupB         FROM A     )      A2 ON B.id = A2.lookupB )  B2 ON C.id = B2.lookupC 

EDIT: The tables are fairly large, A is 500k rows, B is 10k rows and C is 100 rows, so there are a lot of uneccesary info if I do a basic inner join and use DISTINCT in the end, like this:

SELECT DISTINCT valueC FROM  C INNER JOIN B on C.id = B.lookupB INNER JOIN A on B.id = A.lookupB 

This is very, very slow (magnitudes times slower than the nested SELECT I do above.

like image 693
Mats Fredriksson Avatar asked Oct 02 '08 08:10

Mats Fredriksson


People also ask

How write distinct in join query?

You can use CTE to get the distinct values of the second table, and then join that with the first table. You also need to get the distinct values based on LastName column. You do this with a Row_Number() partitioned by the LastName, and sorted by the FirstName.

How do you SELECT distinct columns in join queries?

DISTINCT is for all the columns in the select clause, which means all columns combined will be used to get distinct rows. TOP X will only select the number of rows mentioned in X .

Can we use distinct inside count?

Yes, you can use COUNT() and DISTINCT together to display the count of only distinct rows.

Is distinct better or GROUP BY?

DISTINCT is used to filter unique records out of all records in the table. It removes the duplicate rows. SELECT DISTINCT will always be the same, or faster than a GROUP BY.


1 Answers

I did a test on MS SQL 2005 using the following tables: A 400K rows, B 26K rows and C 450 rows.

The estimated query plan indicated that the basic inner join would be 3 times slower than the nested sub-queries, however when actually running the query, the basic inner join was twice as fast as the nested queries, The basic inner join took 297ms on very minimal server hardware.

What database are you using, and what times are you seeing? I'm thinking if you are seeing poor performance then it is probably an index problem.

like image 164
Darrel Miller Avatar answered Oct 09 '22 07:10

Darrel Miller