Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Index is not used with subquery

Tags:

sql

firebird

This takes 0.001 seconds to execute and it uses index seek

SELECT * FROM CUSTOMER WHERE ID IN (1008,1122)

Now I have a stored procedure U_VIP which returns the same ID as example one (1008,1122), and it takes only 0.001 second to execute

SELECT ID FROM U_VIP    //returns (1008,1122)

Now when I combine them, it takes around half-a-second to execute and index is not used

SELECT * FROM CUSTOMER WHERE ID IN (SELECT ID FROM U_VIP)

I've simplified the example above, in actual application the performance is impacted by much higher magnitude. How to force Firebird to use index in this case?

**Using Firebird 2.1

** EDIT **

Base on Mark's answer, use JOIN does improve the execution time because it is now doing index seek.

SELECT CUSTOMER.* 
FROM CUSTOMER
INNER JOIN U_VIP ON U_VIP.ID = CUSTOMER.ID

This is great, however, it introduces another problem for me which I'll try to explain in the following example.

SELECT CUSTOMER.* 
FROM CUSTOMER
WHERE (:AREAID = 0 OR ID IN (SELECT ID FROM U_VIP(:AREAID)))

Using where clause, I can conditionally apply the filter base on whether :AREAID is supplied by the user. How do I achieve the same when I replace the where clause with a join?

Something like:

SELECT CUSTOMER.* 
FROM CUSTOMER
{IF :AREAID > 0 THEN}
INNER JOIN (SELECT ID FROM U_VIP(:AREAID)) VIP ON VIP.ID = CUSTOMER.ID
{END IF}

Which of course, Firebird dislikes the part with braces =/

like image 802
Kagawa Avatar asked Jul 18 '13 05:07

Kagawa


People also ask

Which Cannot be used in subquery?

You cannot include text, unitext, or image datatypes in subqueries. Subqueries cannot manipulate their results internally, that is, a subquery cannot include the order by clause, the compute clause, or the into keyword.

Why order is not used in subquery?

An ORDER BY command cannot be used in a subquery, although the main query can use an ORDER BY. The GROUP BY command can be used to perform the same function as the ORDER BY in a subquery. Subqueries that return more than one row can only be used with multiple value operators such as the IN operator.

What are the disadvantages of subquery?

Disadvantages of Subquery:The optimizer is more mature for MYSQL for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as join. We cannot modify a table and select from the same table within a subquery in the same SQL statement.

Which clause is mandatory in subquery?

Which of the following clause is mandatorily used in a sub-query? Answer: A. A sub-query is just like any other query which has to start with a SELECT clause. They are contained within an outer query.


1 Answers

Instead of IN, you need to use EXISTS or an INNER JOIN. I am not entirely sure about the details, but I believe in your query the CUSTOMER table is fully read, evaluating the result of the subquery for every row (maybe even executing the subquery for every row). As the optimizer doesn't know the number of results of the subquery in advance, it can't create an optimization like it can if you use a fixed number of literal values like in your first query.

Try changing your query to:

SELECT * 
FROM CUSTOMER 
WHERE EXISTS (SELECT 1 FROM U_VIP WHERE U_VIP.ID = CUSTOMER.ID)

Or:

SELECT CUSTOMER.* 
FROM CUSTOMER
INNER JOIN U_VIP ON U_VIP.ID = CUSTOMER.ID

Or (changing the order sometimes can lead to better performance):

SELECT CUSTOMER.* 
FROM U_VIP
INNER JOIN CUSTOMER ON CUSTOMER.ID = U_VIP.ID

In general I would expect those queries to perform better than the query with IN.

Edit in response to update

Based on your updated question I can think of multiple solutions, I am not entirely sure on their performance though.

  • Use separate queries for :AREAID is 0 and :AREAID is not 0
  • Use a stored procedure or EXECUTE BLOCK with an EXECUTE STATEMENT with a dynamically built statement (variant of the previous)
  • Make the stored procedure U_VIP return all customers if :AREAID is 0
  • Use an additional JOIN condition OR :AREAID = 0; this might not yield results if U_VIP returns nothing for 0 (and might not perform *)
  • Use a LEFT JOIN and add WHERE U_VIP.ID IS NOT NULL OR :AREAID = 0 (might not perform *)
  • Use a UNION of the 'normal' query and a second query on CUSTOMER with WHERE :AREAID = 0 (might not perform *)

For (*) see the 'Smart logic' anti-pattern

For the dynamically built query you can think of something like:

EXECUTE BLOCK (INPUTCONDITION INTEGER = ?)
    RETURNS (ID INTEGER)
AS
    DECLARE VARIABLE QUERY VARCHAR(6400);
BEGIN
    QUERY = 'SELECT a.ID FROM SORT_TEST a';
    IF (INPUTCONDITION <> 0) then
        QUERY = QUERY || ' WHERE a.ID = ' || INPUTCONDITION;
    FOR EXECUTE STATEMENT QUERY INTO :ID
    DO
        SUSPEND;
END

In this example the value 0 for INPUTCONDITION will generate a query without WHERE-clause, and for other inputs a query with a WHERE-clause. Doing it like this is prone to SQL injection if the parameter is a (VAR)CHAR or BLOB, so be careful. You could also consider two branches where one uses EXECUTE STATEMENT with parameters and the other without.

Instead of EXECUTE BLOCK, you can also use a selectable procedure like you already use for U_VIP; EXECUTE BLOCK is essentially a stored procedure that isn't stored in the database.

See also 'Myth: dynamic SQL is slow'

like image 103
Mark Rotteveel Avatar answered Sep 20 '22 00:09

Mark Rotteveel