Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my defactored SQL run orders of magnitudes quicker?

That is, why does this:

select *
    from tableA
        /* Bunch of inner joins */
    where 
        /* Bunch of clauses */
    and (
        exists (
            select * 
                from tableB, tableC, tableD
                where (tableB.fieldNameA = 'foo') and
                   /* More clauses */
        ) or 
        exists (
            select * 
                from tableB, tableC, tableD
                where (tableB.fieldNameA = 'bar') and
                    /* More clauses */
        )
    )

Run almost 500 times faster than this?

select *
    from tableA
        /* Bunch of inner joins */
    where
        /* Bunch of clauses */
    and exists (
        select * 
            from tableB, tableC, tableD
            where (tableB.fieldNameA = 'foo' or tableB.fieldNameA = 'bar') and
                /* More clauses */
    )

I like the idea of not repeating code, so wanted to consolidate to the second version. They both produce the same result set, but the first runs so much faster I can't not use it. Thoughts?


I've been examining the query plans, but didn't really get the bottom of it. One thing I did notice was that for the factored example the Clustered Index Seek at [tree].[PK__tree__09746778] had an actual number of rows near 93,000. The other example doesn't come close to line counts this high. Here is the text output—not so sure how useful this is.

First example (defactored) produces:

|--Nested Loops(Left Semi Join, OUTER REFERENCES:([initialCategory].[inode]))
   |--Nested Loops(Inner Join, OUTER REFERENCES:([OPUSDev2].[dbo].[tree].[child]))
   |    |--Hash Match(Inner Join, HASH:([OPUSDev2].[dbo].[course].[inode])=([OPUSDev2].[dbo].[tree].[parent]), RESIDUAL:([OPUSDev2].[dbo].[course].[inode]=[OPUSDev2].[dbo].[tree].[parent]))
   |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([OPUSDev2].[dbo].[section].[inode], [Expr1037]) WITH UNORDERED PREFETCH)
   |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([OPUSDev2].[dbo].[course_term].[inode], [Expr1036]) WITH UNORDERED PREFETCH)
   |    |    |    |    |--Hash Match(Inner Join, HASH:([OPUSDev2].[dbo].[course].[course_number])=([OPUSDev2].[dbo].[course_term].[course_number]), RESIDUAL:([OPUSDev2].[dbo].[course_term].[course_number]=[OPUSDev2].[dbo].[course].[course_number]))
   |    |    |    |    |    |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[course].[PK__course__31B762FC]), WHERE:(CONVERT_IMPLICIT(tinyint,[OPUSDev2].[dbo].[course].[show_on_web],0)=(1) AND CONVERT_IMPLICIT(tinyint,[OPUSDev2].[dbo].[course].[show_on_nav],0)=(1)))
   |    |    |    |    |    |--Hash Match(Inner Join, HASH:([OPUSDev2].[dbo].[term].[term_id])=([OPUSDev2].[dbo].[course_term].[term_id]))
   |    |    |    |    |         |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[term].[PK__term__0697FACD]), WHERE:(CONVERT_IMPLICIT(tinyint,[OPUSDev2].[dbo].[term].[active_on_web],0)=(1)))
   |    |    |    |    |         |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[course_term].[course_term_pk]))
   |    |    |    |    |--Index Seek(OBJECT:([OPUSDev2].[dbo].[section].[section_idx2]), SEEK:([OPUSDev2].[dbo].[section].[course_term_inode]=[OPUSDev2].[dbo].[course_term].[inode]) ORDERED FORWARD)
   |    |    |    |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[section].[PK__section__7B264821]), SEEK:([OPUSDev2].[dbo].[section].[inode]=[OPUSDev2].[dbo].[section].[inode]),  WHERE:([OPUSDev2].[dbo].[section].[status]='Active') LOOKUP ORDERED FORWARD)
   |    |    |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[tree].[PK__tree__09746778]))
   |    |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[PK__category__2739D489] AS [initialCategory]), SEEK:([initialCategory].[inode]=[OPUSDev2].[dbo].[tree].[child]),  WHERE:([OPUSDev2].[dbo].[category].[active] as [initialCategory].[active]=(1)) ORDERED FORWARD)
   |--Concatenation
        |--Nested Loops(Inner Join)
        |    |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[PK__category__2739D489] AS [childCategory]), SEEK:([childCategory].[inode]=[OPUSDev2].[dbo].[category].[inode] as [initialCategory].[inode]) ORDERED FORWARD)
        |    |--Nested Loops(Inner Join, OUTER REFERENCES:([parentCategory].[inode]))
        |         |--Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[idx_category_2] AS [parentCategory]), SEEK:([parentCategory].[category_key]='noncredit_subjects') ORDERED FORWARD)
        |         |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[tree].[PK__tree__09746778]), SEEK:([OPUSDev2].[dbo].[tree].[child]=[OPUSDev2].[dbo].[category].[inode] as [initialCategory].[inode] AND [OPUSDev2].[dbo].[tree].[parent]=[OPUSDev2].[dbo].[category].[inode] as [parentCategory].[inode]) ORDERED FORWARD)
        |--Nested Loops(Inner Join)
             |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[PK__category__2739D489] AS [childCategory]), SEEK:([childCategory].[inode]=[OPUSDev2].[dbo].[category].[inode] as [initialCategory].[inode]) ORDERED FORWARD)
             |--Nested Loops(Inner Join, OUTER REFERENCES:([parentCategory].[inode]))
                  |--Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[idx_category_2] AS [parentCategory]), SEEK:([parentCategory].[category_key]='credit_subjects') ORDERED FORWARD)
                  |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[tree].[PK__tree__09746778]), SEEK:([OPUSDev2].[dbo].[tree].[child]=[OPUSDev2].[dbo].[category].[inode] as [initialCategory].[inode] AND [OPUSDev2].[dbo].[tree].[parent]=[OPUSDev2].[dbo].[category].[inode] as [parentCategory].[inode]) ORDERED FORWARD)

Second example (factored) produces:

|--Nested Loops(Left Semi Join, OUTER REFERENCES:([initialCategory].[inode]))
   |--Nested Loops(Inner Join, OUTER REFERENCES:([OPUSDev2].[dbo].[tree].[child]))
   |    |--Hash Match(Inner Join, HASH:([OPUSDev2].[dbo].[course].[inode])=([OPUSDev2].[dbo].[tree].[parent]), RESIDUAL:([OPUSDev2].[dbo].[course].[inode]=[OPUSDev2].[dbo].[tree].[parent]))
   |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([OPUSDev2].[dbo].[section].[inode], [Expr1029]) WITH UNORDERED PREFETCH)
   |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([OPUSDev2].[dbo].[course_term].[inode], [Expr1028]) WITH UNORDERED PREFETCH)
   |    |    |    |    |--Hash Match(Inner Join, HASH:([OPUSDev2].[dbo].[course].[course_number])=([OPUSDev2].[dbo].[course_term].[course_number]), RESIDUAL:([OPUSDev2].[dbo].[course_term].[course_number]=[OPUSDev2].[dbo].[course].[course_number]))
   |    |    |    |    |    |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[course].[PK__course__31B762FC]), WHERE:(CONVERT_IMPLICIT(tinyint,[OPUSDev2].[dbo].[course].[show_on_web],0)=(1) AND CONVERT_IMPLICIT(tinyint,[OPUSDev2].[dbo].[course].[show_on_nav],0)=(1)))
   |    |    |    |    |    |--Hash Match(Inner Join, HASH:([OPUSDev2].[dbo].[term].[term_id])=([OPUSDev2].[dbo].[course_term].[term_id]))
   |    |    |    |    |         |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[term].[PK__term__0697FACD]), WHERE:(CONVERT_IMPLICIT(tinyint,[OPUSDev2].[dbo].[term].[active_on_web],0)=(1)))
   |    |    |    |    |         |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[course_term].[course_term_pk]))
   |    |    |    |    |--Index Seek(OBJECT:([OPUSDev2].[dbo].[section].[section_idx2]), SEEK:([OPUSDev2].[dbo].[section].[course_term_inode]=[OPUSDev2].[dbo].[course_term].[inode]) ORDERED FORWARD)
   |    |    |    |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[section].[PK__section__7B264821]), SEEK:([OPUSDev2].[dbo].[section].[inode]=[OPUSDev2].[dbo].[section].[inode]),  WHERE:([OPUSDev2].[dbo].[section].[status]='Active') LOOKUP ORDERED FORWARD)
   |    |    |--Clustered Index Scan(OBJECT:([OPUSDev2].[dbo].[tree].[PK__tree__09746778]))
   |    |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[PK__category__2739D489] AS [initialCategory]), SEEK:([initialCategory].[inode]=[OPUSDev2].[dbo].[tree].[child]),  WHERE:([OPUSDev2].[dbo].[category].[active] as [initialCategory].[active]=(1)) ORDERED FORWARD)
   |--Nested Loops(Inner Join)
        |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[PK__category__2739D489] AS [childCategory]), SEEK:([childCategory].[inode]=[OPUSDev2].[dbo].[category].[inode] as [initialCategory].[inode]) ORDERED FORWARD)
        |--Nested Loops(Inner Join, OUTER REFERENCES:([OPUSDev2].[dbo].[tree].[parent]))
             |--Clustered Index Seek(OBJECT:([OPUSDev2].[dbo].[tree].[PK__tree__09746778]), SEEK:([OPUSDev2].[dbo].[tree].[child]=[OPUSDev2].[dbo].[category].[inode] as [initialCategory].[inode]) ORDERED FORWARD)
             |--Index Seek(OBJECT:([OPUSDev2].[dbo].[category].[idx_category_2] AS [parentCategory]), SEEK:([parentCategory].[category_key]='credit_subjects' AND [parentCategory].[inode]=[OPUSDev2].[dbo].[tree].[parent] OR [parentCategory].[category_key]='noncredit_subjects' AND [parentCategory].[inode]=[OPUSDev2].[dbo].[tree].[parent]) ORDERED FORWARD)
like image 373
Matt Avatar asked Dec 04 '12 21:12

Matt


1 Answers

The only way to know for sure why there is a speed difference is to view the query plans for each query and see what the optimiser is doing differently. Empirically, others have found that changing OR clauses to UNION clauses have helped the optimiser to better use indexes, thus improving query execution time.

The article here gives a very good explanation as to some of the variables that affect a query plan including selectivity.

Why UNION causes more seeks instead of scans is because each operation needs to meet a certain selectivity requirement in order to qualify for a seek. (Selectivity is the uniqueness of the particular column being filtered). OR’s happen in a single operation so when the selectivity for each column is combined and it goes over a certain percentage then a scan is deemed more efficient.

You seem to think the following block makes all the difference:

where (table.fieldNameA = 'foo' or table.fieldNameA = 'bar')

Since a UNION by default performs a separate operation for each statement, the selectivity of each column is not combined giving it a greater chance of performing a seek. Now since the UNION performs two operations, they need to match their result sets using a concatenation operation above. Generally this is not an expensive operation.

Why I'm even mentioning UNION here is because your optimised SQL behaves very similarly to a UNION.

exists (
            select * 
                from tableB, tableC, tableD
                where (table.fieldNameA = 'foo') and
                   /* More clauses */
        ) or 
        exists (
            select * 
                from tableB, tableC, tableD
                where (table.fieldNameA = 'bar') and
                    /* More clauses */
        )

You have two separate sub-queries, each of which can be parallelised, and then joined together on completion. Each of those sub-queries also has a better selectivity ratio, encouraging the use of seeks rather than scans.

This is supported by your (updated) query plans. In your first plan, at the bottom, you have this:

Concatenation
        |--Nested Loops(Inner Join)
        |    |--Clustered Index Seek
        |    |--Nested Loops
        |         |--Index Seek
        |         |--Clustered Index Seek
        |--Nested Loops(Inner Join)
             |--Clustered Index Seek
             |--Nested Loops
                  |--Index Seek
                  |--Clustered Index Seek

Showing that the OR condition has been split in to two separate jobs that can be parallelised, with the index seek quite probably performing better as the selectivity will be improved.

Whenever you have a poorly performing query, it's necessary to view the query plan to see which sub-sections are taking a very long time, and attempting to fix those first. Very similar to how you'd optimise a program for performance.

Queries can be very hard to optimise by eye as so many variables can affect the query plan, like the number of rows in a table, indexes available and the selectivity of certain columns.

like image 132
Josh Smeaton Avatar answered Oct 23 '22 04:10

Josh Smeaton