Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrong index being used when selecting top rows

I have a simple query, which selects top 200 rows ordered by one of the columns filtered by other indexed column. The confusion is why is that the query plan in PL/SQL Developer shows that this index is used only when I'm selecting all rows, e.g.:

SELECT * FROM
(
 SELECT *
 FROM cr_proposalsearch ps
 WHERE UPPER(ps.customerpostcode) like 'MK3%'
 ORDER BY ps.ProposalNumber DESC
)
WHERE ROWNUM <= 200

Plan shows that it uses index CR_PROPOSALSEARCH_I1, which is an index on two columns: PROPOSALNUMBER & UPPER(CUSTOMERNAME), this takes 0.985s to execute: query with ROWNUM

If I get rid of ROWNUM condition, the plan is what I expect and it executes in 0.343s: query without ROWNUM

Where index XIF25CR_PROPOSALSEARCH is on CR_PROPOSALSEARCH (UPPER(CUSTOMERPOSTCODE));

How come?

EDIT: I have gathered statistics on cr_proposalsearch table and both query plans now show that they use XIF25CR_PROPOSALSEARCH index.

like image 850
Ruslan Avatar asked Aug 02 '11 11:08

Ruslan


2 Answers

Including the ROWNUM changes the optimizer's calculations about which is the more efficient path.

When you do a top-n query like this, it doesn't necessarily mean that Oracle will get all the rows, fully sort them, then return the top ones. The COUNT STOPKEY operation in the execution plan indicates that Oracle will only perform the underlying operations until it has found the number of rows you asked for.

The optimizer has calculated that the full query will acquire and sort 77K rows. If it used this plan for the top-n query, it would have to do a large sort of those rows to find the top 200 (it wouldn't necessarily have to fully sort them, as it wouldn't care about the exact order of rows past the top; but it would have to look over all of those rows).

The plan for the top-n query uses the other index to avoid having to sort at all. It considers each row in order, checks whether it matches the predicate, and if so returns it. When it's returned 200 rows, it's done. Its calculations have indicated that this will be more efficient for getting a small number of rows. (It may not be right, of course; you haven't said what the relative performance of these queries is.)

If the optimizer were to choose this plan when you ask for all rows, it would have to read through the entire index in descending order, getting each row from the table by ROWID as it goes to check against the predicate. This would result in a lot of extra I/O and inspecting many rows that would not be returned. So in this case, it decides that using the index on customerpostcode is more efficient.

If you gradually increase the number of rows to be returned from the top-n query, you will probably find a tipping point where the plan switches from the first to the second. Just from the costs of the two plans, I'd guess this might be around 1,200 rows.

like image 105
Dave Costa Avatar answered Oct 01 '22 18:10

Dave Costa


If you are sure your stats are up to date and that the index is selective enough, you could tell oracle to use the index

SELECT  *
FROM   (SELECT /*+ index(ps XIF25CR_PROPOSALSEARCH) */  *
        FROM     cr_proposalsearch ps
        WHERE    UPPER (ps.customerpostcode) LIKE 'MK3%'
        ORDER BY ps.proposalnumber DESC)
WHERE  ROWNUM <= 200

(I would only recommend this approach as a last resort)

If I were doing this I would first tkprof the query to see actually how much work it is doing,

e.g: the cost of index range scans could be way off

forgot to mention.... You should check the actual cardinality:

SELECT count(*)  FROM cr_proposalsearch ps  WHERE UPPER(ps.customerpostcode) like 'MK3%' 

and then compare it to the cardinality in the query plan.

like image 33
Kevin Burton Avatar answered Oct 01 '22 19:10

Kevin Burton