Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complex SQL join with group by

I'm trying to optimize a query which is taking a long time. The goal of the query is to get best similar F2 .(Specially similarity measure) This is an example of what I have:

 CREATE TABLE Test
(
   F1 varchar(124),
   F2 varchar(124),
   F3 varchar(124)
)
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'A', 'B', 'C' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'D', 'B', 'E' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'F', 'I', 'G' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'F', 'I', 'G' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'D', 'B', 'C' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'F', 'B', 'G' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'D', 'I', 'C' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'A', 'B', 'C' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'A', 'B', 'K' )
INSERT INTO TEST (  F1, F2, F3 ) VALUES ( 'A', 'K', 'K' )

Now if I run this query:

SELECT B.f2,COUNT(*) AS CNT  
FROM 
(
select F1,F3 from Test
where F2='B'
 )AS A
    INNER JOIN  Test AS B
   ON A.F1 = B.F1 AND  A.F3 = B.F3
GROUP BY B.F2 
ORDER BY CNT DESC 

The table has 1m+ rows. What would be a better way to do this?

like image 452
jozi Avatar asked Sep 16 '12 04:09

jozi


2 Answers

You can write your query in this form too, and because you have one select so your retrieve time will be reduced

SELECT  Test_1.F2, COUNT(Test_1.F1) AS Cnt 
FROM    Test 
INNER JOIN Test AS Test_1 ON Test.F1 = Test_1.F1 AND Test.F3 = Test_1.F3 
WHERE   (Test.F2 = 'B') 
GROUP BY Test_1.F2
like image 186
Maryam Arshi Avatar answered Sep 28 '22 09:09

Maryam Arshi


Here is another way to write your query. Close to guido's answer runnable in MS SQL.

WITH Filtered AS (SELECT DISTINCT F1,F3 FROM Test WHERE F2='B')
SELECT B.f2,COUNT(*) AS CNT
  FROM Test B
       INNER JOIN Filtered
           ON B.F1 = Filtered.F1 AND B.F3 = Filtered.F3
 GROUP BY B.F2
 ORDER BY CNT DESC

I think your original query might have a bug, like Fred mentioned. The count of F2="B" should be 6, not 8, in your example, is that right? If 8 is intended, take out DISTINCT.

Another thing you might try is to make TEST table's clustered index to be (F2, F1, F3), and make another non-clustered index on (F1, F3).

Sample code is also available on SqlFiddle.

like image 40
kennethc Avatar answered Sep 28 '22 09:09

kennethc